Human Pose Estimation and Gesture Recognition - Orisys Academy for Skill Development & Research

By harish hv on 19th January 2024

Problem statement

The problem at hand is the need for a system capable of accurately estimating human
poses and recognizing gestures. Traditional approaches often fall short in capturing the
intricacies of human movements in various contexts. The challenge is to develop an
advanced system utilizing computer vision and deep learning techniques to precisely
analyze human poses and gestures, facilitating applications in fields such as
human-computer interaction, virtual reality, and healthcare.

Abstract

This project focuses on the implementation of a sophisticated system for Human Pose
Estimation and Gesture Recognition. Leveraging the power of computer vision and deep
learning, the system aims to accurately capture and interpret human body poses and
gestures in real-time. By combining advanced algorithms and neural networks, the
system provides a versatile solution applicable in diverse domains. The primary goal is
to enhance human-computer interaction by enabling machines to understand and
respond to human movements effectively.

Outcome

The final outcome is a smart system that can “see” and understand how humans are
moving and gesturing. It uses advanced technologies like deep learning to accurately
figure out body poses and gestures in real-time. This system can be used in various
applications, from making virtual reality more realistic to improving how we interact with
computers. Essentially, it’s a technology that helps computers understand human body
language better.

Reference

Human Pose Estimation is one of the challenging yet broadly researched areas. Pose estimation is required in applications that include human activity detection, fall detection, motion capture in AR/VR, etc. Nevertheless, images and videos are required for every application that captures images using a standard RGB camera, without any external devices. This paper presents a real-time approach for sign language detection and recognition in videos using the Holistic pose estimation method of MediaPipe. This Holistic framework detects the movements of multiple modalities-facial expression, hand gesture and body pose, which is the best for the sign language recognition model. The experiment conducted includes five different signers, signing ten distinct words in a natural background. Two signs, “blank” and “sad,” were best recognized by the model.

T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture”, Comput. Vis. Image Underst., vol. 81, no. 3, pp. 231-268, 2001.
Y. Yang and D. Ramanan, “Articulated human detection with flexible mixtures-of-parts”, IEEE transactions on pattern analysis and machine intelligence, vol. 35, pp. 2878-2890, 2012.
C. Zheng et al., “Deep learning-based human pose estimation: A survey”, Deep Learning-Based Human Pose Estimation: A Survey, 2020.
Y. Chen, Y. Tian and M. He, “Monocular human pose estimation: A survey of deep learning-based methods”, Comput. Vis. Image Underst., vol. 192, pp. 102897, 2020.
S. Zuffi, O. Freifeld and M. J. Black, “From pictorial structures to deformable structures”, IEEE conference on computer vision and pattern recognition, pp. 3546-3553, 2012.

https://ieeexplore.ieee.org/document/9696513/references#references