CN-122024315-A - Sign language identification method based on MEDIAPIPE and XGBoost algorithms
Abstract
The invention provides a sign language identification method based on MEDIAPIPE and XGBoost algorithms, which relates to the technical field of computer vision and pattern identification, and aims to realize accurate collaborative extraction of key points of multiple parts of hands, arms and faces by adopting a MEDIAPIPE framework, replace a laggard mode of traditional image processing, construct a multi-mode feature fusion system, expand the expression types of identifiable sign languages, and solve the core pain points in the prior art by combining feature screening, dynamic weight optimization and efficient classification capability of XGBoost algorithms, thereby realizing accurate and stable identification of complex sign language actions.
Inventors
- YAN QIUFENG
Assignees
- 南通大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (10)
- 1. The sign language identification method based on MEDIAPIPE and XGBoost algorithms is characterized by comprising the following steps of: s1, multi-part data acquisition and pretreatment; s2, constructing multi-part coordination characteristic engineering; s3, carrying out dynamic optimization on the XGBoost model; s4, hierarchical multi-mode cross checking; s5, constructing a sign language recognition system.
- 2. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 1, wherein the step S1 specifically includes the steps of: S11, extracting hand key points, arm core key points and facial core expression key points; S12, calibrating space-time synchronism; S13, removing noise and abnormal points through confidence level screening, interpolation completion and Gaussian filtering; and S14, outputting the denoised multi-position key point data set.
- 3. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 2, wherein the step S2 specifically includes the steps of: s21, extracting hand, arm and face features; S22, splicing and fusing the multi-part features; S23, XGBoost feature importance screening and feature dimension reduction, wherein the calculation complexity is reduced on the premise of keeping key information, and 80-dimensional fusion feature vectors are formed; and S24, outputting the 80-dimensional fusion feature vector.
- 4. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 3, wherein said step S3 specifically includes the steps of: S31, constructing XGBoost classifiers; s32, introducing a multi-part characteristic dynamic weight distribution mechanism; S33, adopting a super-parameter optimization strategy combining grid search and cross verification to avoid model overfitting; S34, designing an online incremental learning algorithm, and updating XGBoost the decision tree node weight of the model through local weighted regression; and S35, outputting an optimization XGBoost model adapting to the habit of the user.
- 5. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 4, wherein said step S4 includes the steps of: S41, constructing 3 single-part auxiliary classifiers; S42, independent classification of single-part features; S43, hierarchical weighted fusion; s44, scene self-adaptive weight adjustment; s45, outputting a stable and reliable sign language recognition result.
- 6. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 5, wherein the step S5 is specifically to build a man-machine interaction system for sign language real-time recognition and implant it into a mobile phone.
- 7. The sign language recognition method according to claim 6, wherein in the step S11, the extracted key points include 21 key points of the hand, 3 core joint points of the arm and shoulder joint, the elbow joint, the wrist joint, and 68 core expression key points of the face.
- 8. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 7, wherein the geometric features of the hand feature in step S21 include the degree of bending of the finger and the angle between the two fingers.
- 9. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 8, wherein in the step S21, the geometrical features of the arm feature include an angle between the arm and the vertical direction, an angle between the right shoulder joint and the wrist joint, and an angle between the large arm and the small arm.
- 10. The sign language recognition method based on MEDIAPIPE and XGBoost algorithms according to claim 9, wherein the geometric features of the facial features in step S21 include the aspect ratio of lips, the vertical distance between upper and lower eye and face key points, the eye opening and closing degree, and the height and distance between eyebrow key points.
Description
Sign language identification method based on MEDIAPIPE and XGBoost algorithms Technical Field The invention relates to the technical field of computer vision and pattern recognition, in particular to a sign language recognition method based on MEDIAPIPE and XGBoost algorithms. Background Sign language is used as a core communication tool for people with dyshearing, and the recognition accuracy depends on the overall capture of limb actions and expression information. The prior art mainly relies on the traditional image processing method in combination with a classification algorithm to realize classification of different sign language gestures. The method has the advantages of great influence of light change, background interference and limb shielding, low positioning precision of key points, low extraction speed, strong noise sensitivity and the like, and restricts the improvement of the recognition accuracy. To solve this problem, wearable devices and human-machine interface systems are emerging, but such equipment is a burden for people, affecting the comfort of the user. The MEDIAPIPE algorithm is a real-time attitude estimation framework based on deep learning, and is an open source framework developed by Google, and aims to help a developer to construct a high-performance real-time multimedia processing pipeline. It is particularly suitable for processing video and images, and is used for scenes such as facial expression recognition, gesture estimation and the like. The existing gesture recognition method adopting MEDIAPIPE algorithm only focuses on feature extraction of 21 key points of the hand, and the sign language as a limb language not only relates to conventional gesture recognition, but also relates to some limb actions and even facial expressions in most sign language gestures. Therefore, the traditional gesture recognition method based on MEDIAPIPE algorithm is used for feature extraction of key points of faces and limbs in sign language recognition, so that the accuracy of sign language recognition is low. Disclosure of Invention Aiming at the problems, the invention provides a sign language identification method based on MEDIAPIPE and XGBoost algorithms, which realizes accurate collaborative extraction of key points of multiple parts of hands, arms and faces by adopting a MEDIAPIPE framework to replace the traditional image processing lag mode, constructs a multi-mode feature fusion system to expand the expression types of identifiable sign languages, combines the feature screening, dynamic weight optimization and high-efficiency classification capability of XGBoost algorithms, solves the core pain points of the prior art, and realizes accurate and stable identification of complex sign language actions. The invention provides a sign language identification method based on MEDIAPIPE and XGBoost algorithms, which comprises the following steps: Step 1, multi-part data acquisition and pretreatment; step 2, constructing multi-part collaborative feature engineering; step 3, dynamically optimizing the XGBoost model; step4, hierarchical multi-mode cross checking; and 5, constructing a sign language recognition system. Further, step 1 includes adopting MEDIAPIPE multi-frame collaborative extraction technology to synchronously obtain 21 key points of the hand, 3 core joint points (shoulder joint, elbow joint and wrist joint) of the arm and 68 core expression key points of the face, so that time-space synchronism of multi-position data is ensured, noise and abnormal points are removed through confidence level screening, interpolation complementation and Gaussian filtering, reliability of key point data is ensured, and compared with a traditional image processing method, MEDIAPIPE algorithm can remarkably reduce positioning errors of the key points and improve extraction speed. Further, step 2 comprises the steps of retaining original distance matrix and angle matrix characteristics in the aspect of hand characteristics, ensuring fine motion capturing capability, introducing joint included angles, movement speed and gesture trend characteristics into arm characteristics, reflecting key information of limb movement in a large range, screening eye, eyebrow and mouth core areas by facial characteristics, extracting features such as opening and closing degree and morphological change, capturing emotion auxiliary semantics, and reducing dimension by XGBoost feature importance screening and PCA (principal component analysis) after multi-part feature stitching, so that calculation complexity is reduced on the premise of retaining key information, and 80-dimension fusion feature vectors are formed. Further, step 3 includes constructing XGBoost a classifier, introducing a multi-part feature dynamic weight distribution mechanism, adjusting the feature contribution degree of each part according to sign language types and complexity, adopting a super-parameter optimization strategy combining grid search and cross