Search

CN-115798030-B - Rotation-based gesture recognition method and device, electronic equipment and storage medium

CN115798030BCN 115798030 BCN115798030 BCN 115798030BCN-115798030-B

Abstract

The embodiment of the invention discloses a gesture recognition method, a gesture recognition device, electronic equipment and a storage medium based on rotation, which are used for detecting positions of first key points corresponding to hand key points in continuous frame images, calculating three-dimensional coordinates of each hand key point in a corresponding first coordinate system, wherein the first coordinate system is a three-dimensional coordinate system determined by adjacent key points of the hand key points far away from fingertips, inputting the three-dimensional coordinates into a preset forward kinematic model based on rotation to obtain a pose transformation set of each frame image corresponding to the hand key points, calculating a jacobian matrix corresponding to each frame image and a predicted three-dimensional coordinate corresponding to each hand key point according to the pose transformation set, calculating a corresponding second key point position according to the predicted three-dimensional coordinates, optimizing the pose transformation set based on the positions of the key points and the jacobian matrix, and outputting parameters based on the optimized pose transformation set. The invention ensures consistency of front and rear hand shape processing, thereby improving accuracy of processing results.

Inventors

  • SUN LIYE
  • ZHANG TENG

Assignees

  • 广州视源电子科技股份有限公司
  • 广州视源人工智能创新研究院有限公司

Dates

Publication Date
20260512
Application Date
20210909

Claims (14)

  1. 1. A method of gesture recognition based on rotation, comprising: acquiring continuous frame images acquired by a camera, and detecting hand key points in the continuous frame images to acquire a first key point position corresponding to each hand key point in each frame image; Calculating three-dimensional coordinates of each hand key point in a corresponding first coordinate system, wherein the first coordinate system is a three-dimensional coordinate system determined by adjacent key points of the hand key points far away from fingertips; Inputting the three-dimensional coordinates into a preset forward kinematics model to obtain a pose transformation set of each frame of image corresponding to the hand key points, wherein the forward kinematics model is a plurality of hand key points connected through rigid bodies and is based on the kinematic description of rotation, in the forward kinematics model, the poses of the hand key points corresponding to the wrists are described in a camera coordinate system, the poses of the hand key points corresponding to the metacarpophalangeal joints are expressed relative to the hand key points corresponding to the wrists by using an exponential product formula and a synthesis rule based on rigid body motion by using a rotation method, and the pose transformation set is a set of poses corresponding to all the hand key points; calculating a jacobian matrix corresponding to each frame of image and a predicted three-dimensional coordinate corresponding to a hand key point based on the pose transformation set, and calculating a corresponding second key point position according to projection of the predicted three-dimensional coordinate in the image; Non-linear optimization is carried out on the pose transformation set according to the first key point position, the second key point position and the jacobian matrix; and outputting preset hand key point parameters according to the forward kinematics model and the optimized pose transformation set.
  2. 2. The gesture recognition method according to claim 1, wherein after the non-linear optimization of the pose transformation set according to the first keypoint location, the second keypoint location, and the jacobian matrix, further comprises: and taking the currently processed image as a reference, acquiring a pose transformation set of the image with the preset number in total from front to back, and performing time domain smoothing on the pose transformation set of the currently processed image based on the acquired pose transformation set.
  3. 3. The gesture recognition method of claim 2, wherein the temporal smoothing is performed by low pass filtering.
  4. 4. The gesture recognition method of claim 1, further comprising, prior to the pre-setting the forward kinematic model: Acquiring camera model parameters, initial gestures and hand configuration, wherein the hand configuration comprises rigid body connection lengths among hand key points and positions of the hand key points in a camera coordinate system; The forward kinematics model is obtained based on the camera model parameters, the initial gestures and the rigid connection length by using an exponential product formula and a synthesis rule of rigid transformation of the hand key points.
  5. 5. The gesture recognition method of claim 4, further comprising, prior to the pre-setting the forward kinematic model: Acquiring the degree of freedom of key points of each hand; the forward kinematic model is provided with kinematic state constraints on the degrees of freedom.
  6. 6. The gesture recognition method of claim 1, wherein the initial value of the pose transformation set of the currently processed image is the pose transformation set corresponding to the previous frame of image, or is the pose transformation set initialized based on the currently processed image.
  7. 7. The gesture recognition method of claim 6, wherein the set of pose transformations comprises a motion rotation and a rotation angle of a joint corresponding to each hand keypoint; When the pose transformation set is initialized, the motion rotation corresponds to the rotation motion of the joint when all other joints except the joint are fixed at the position with the rotation angle of 0, and the pose of the hand key point corresponding to the wrist joint is calculated by combining the multi-view geometry and the motion recovery structure with the camera model parameters of the camera.
  8. 8. The method of claim 1, wherein the non-linearly optimizing the set of pose transformations based on the first keypoint location, the second keypoint location, and the jacobian matrix comprises: Calculating a reprojection error through the first key point position and the second key point position; calculating a time domain error through a pose transformation set of the currently processed image and the previous frame image; Calculating the sum of the reprojection error and the time domain error to obtain an error vector; and carrying out nonlinear optimization on the pose transformation set according to the error vector and the jacobian matrix.
  9. 9. The gesture recognition method of claim 8, wherein the time-domain error is determined by Calculation of wherein Representing the time-domain error and, Representing the rotation angle and the translation amount corresponding to the pose transformation set of the currently processed image; Representing the rotation angle and the translation amount corresponding to the pose transformation set of the previous frame of image; Representing the square of the L2 norm of the vector.
  10. 10. The gesture recognition method according to claim 8, wherein the non-linear optimization of the pose transformation set according to the error vector and jacobian matrix is specifically: And inputting the error vector and the jacobian matrix into a preset nonlinear optimization model, and outputting an optimized pose transformation set.
  11. 11. The gesture recognition method of claim 1, wherein the preset hand keypoint parameters include one or more of 3D coordinates of the hand keypoint, rotation angle and translation amount of the hand keypoint, and pose transformation set of the hand keypoint.
  12. 12. A rotation-based gesture recognition apparatus, comprising: the device comprises a key point detection unit, a camera and a camera, wherein the key point detection unit is used for acquiring continuous frame images acquired by the camera, and detecting hand key points in the continuous frame images so as to acquire a first key point position corresponding to each hand key point in each frame image; The coordinate calculation unit is used for calculating the three-dimensional coordinates of each hand key point in a corresponding first coordinate system, wherein the first coordinate system is a three-dimensional coordinate system determined by adjacent key points, far away from fingertips, of the hand key points; The model calculation unit is used for inputting the three-dimensional coordinates into a preset forward kinematics model to obtain a pose transformation set of each frame of image corresponding to the hand key points, wherein the forward kinematics model is a plurality of hand key points connected through rigid bodies and is based on the kinematic description of rotation, in the forward kinematics model, the pose of the hand key points corresponding to the wrists is described in a camera coordinate system, and relative to the hand key points corresponding to the wrists, the pose of the hand key points corresponding to the metacarpophalangeal joints is represented by using an exponential product formula and a synthesis rule of the rigid body motion based on a rotation method, and the pose transformation set is a set of the poses corresponding to all the hand key points; The position calculation unit is used for calculating a jacobian matrix corresponding to each frame of image and a predicted three-dimensional coordinate corresponding to a hand key point based on the pose transformation set, and calculating a corresponding second key point position according to projection of the predicted three-dimensional coordinate in the image; the nonlinear optimization unit is used for carrying out nonlinear optimization on the pose transformation set according to the first key point position, the second key point position and the jacobian matrix; And the parameter output unit is used for outputting preset hand key point parameters according to the forward kinematics model and the optimized pose transformation set.
  13. 13. An electronic device, comprising: one or more processors; A memory for storing one or more programs; when executed by the one or more processors, causes the electronic device to implement the curl-based gesture-recognition method of any of claims 1-11.
  14. 14. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a screw-based gesture recognition method according to any of claims 1-11.

Description

Rotation-based gesture recognition method and device, electronic equipment and storage medium Technical Field The embodiment of the invention relates to the technical field of image processing, in particular to a method and a device for gesture recognition based on rotation, electronic equipment and a storage medium. Background In the field of man-machine interaction, because gesture interaction has the advantages of nature, intuitiveness and the like, the gesture interaction becomes an interaction research hotspot and a technical key point, and accurate and rapid hand gesture estimation is one of important technologies for gesture interaction. The hand gesture estimation is widely applied, household appliances can be controlled by making corresponding gestures in front of a camera in the intelligent household field, such as fast forward, fast backward, pause and play of videos on a display screen or a television, interaction between hands of a user and the real world or the virtual world is generated in the AR/VR field, the immersive experience effect of the user is greatly improved, and hand gesture recognition can enable communication between a normal person and a deaf-mute to be more convenient and accurate in the sign language field. The data base of the hand gesture estimation technology is hand motion data, so that whether the hand motion data is accurately acquired or not can be said, and whether the subsequent gesture interaction can be successfully realized or not is determined. In the existing scheme, the source of the hand motion data can have a plurality of different types of input devices, such as a data glove, an acceleration sensor, a touch screen, a monocular camera, a multi-view camera, a depth camera and the like, and the different types of input devices have respective advantages and defects in obtaining the hand motion data. The monocular camera and the multi-eye camera directly collect images, image information of bare hands is analyzed from the images, hand motion data are obtained, the types or the semantics of gestures are further identified, the positions of the hands in a real three-dimensional space are calculated, and from the aspect of convenience of use, the technology does not need extra equipment and accords with interaction habits of human beings. In the specific gesture recognition, the original data collected by the monocular camera and the multi-eye camera are 2D image data, and three-dimensional gesture posture estimation is performed through a deep neural network or the like after three-dimensional space positions of hand key points are calculated from the 2D image data, but the estimation is completed completely based on a model obtained by training a graphic sample, and consistency of front and rear hand shapes cannot be ensured for recognition results of continuous graphic processing of the same user. Disclosure of Invention The invention provides a method, a device, electronic equipment and a storage medium for gesture recognition based on rotation, which are used for solving the technical problem that the shapes of the gesture recognition results of the same user are inconsistent. In a first aspect, an embodiment of the present invention provides a rotation-based gesture recognition method, including: acquiring continuous frame images acquired by a camera, and detecting hand key points in the continuous frame images to acquire a first key point position corresponding to each hand key point in each frame image; Calculating three-dimensional coordinates of each hand key point in a corresponding first coordinate system, wherein the first coordinate system is a three-dimensional coordinate system determined by adjacent key points of the hand key points far away from fingertips; Inputting the three-dimensional coordinates into a preset forward kinematics model to obtain a pose transformation set of each frame of image corresponding to the hand key points, wherein the forward kinematics model is a plurality of hand key points connected through rigid bodies and is based on the kinematic description of rotation, in the forward kinematics model, the poses of the hand key points corresponding to the wrists are described in a camera coordinate system, the poses of the hand key points corresponding to the metacarpophalangeal joints are expressed relative to the hand key points corresponding to the wrists by using an exponential product formula and a synthesis rule based on rigid body motion by using a rotation method, and the pose transformation set is a set of poses corresponding to all the hand key points; calculating a jacobian matrix corresponding to each frame of image and a predicted three-dimensional coordinate corresponding to a hand key point based on the pose transformation set, and calculating a corresponding second key point position according to projection of the predicted three-dimensional coordinate in the image; Non-linear optimization is carried out on the po