Problem statement
The problem at hand is the need for a system capable of accurately estimating human
poses and recognizing gestures. Traditional approaches often fall short in capturing the
intricacies of human movements in various contexts. The challenge is to develop an
advanced system utilizing computer vision and deep learning techniques to precisely
analyze human poses and gestures, facilitating applications in fields such as
human-computer interaction, virtual reality, and healthcare.
Abstract
This project focuses on the implementation of a sophisticated system for Human Pose
Estimation and Gesture Recognition. Leveraging the power of computer vision and deep
learning, the system aims to accurately capture and interpret human body poses and
gestures in real-time. By combining advanced algorithms and neural networks, the
system provides a versatile solution applicable in diverse domains. The primary goal is
to enhance human-computer interaction by enabling machines to understand and
respond to human movements effectively.
Outcome
The final outcome is a smart system that can “see” and understand how humans are
moving and gesturing. It uses advanced technologies like deep learning to accurately
figure out body poses and gestures in real-time. This system can be used in various
applications, from making virtual reality more realistic to improving how we interact with
computers. Essentially, it’s a technology that helps computers understand human body
language better.
Reference
Human Pose Estimation is one of the challenging yet broadly researched areas. Pose estimation is required in applications that include human activity detection, fall detection, motion capture in AR/VR, etc. Nevertheless, images and videos are required for every application that captures images using a standard RGB camera, without any external devices. This paper presents a real-time approach for sign language detection and recognition in videos using the Holistic pose estimation method of MediaPipe. This Holistic framework detects the movements of multiple modalities-facial expression, hand gesture and body pose, which is the best for the sign language recognition model. The experiment conducted includes five different signers, signing ten distinct words in a natural background. Two signs, “blank” and “sad,” were best recognized by the model.