US-12620263-B2 - Gesture recognizing method, interactive method, gesture interactive system, electronic device, and storage medium

US12620263B2US 12620263 B2US12620263 B2US 12620263B2US-12620263-B2

Abstract

A gesture recognizing method, an interactive method, a gesture interactive system, an electronic device, and a non-transitory computer-readable storage medium are disclosed. The gesture recognizing method includes: acquiring a plurality of groups of images taken respectively at different photographing moments for a gesture action object, wherein each group of images includes at least one pair of corresponding depth map and grayscale map; and according to the plurality of groups of images, obtaining spatial information by using the depth map in each group of images, and obtaining posture information for the gesture action object by using the grayscale map in each group of the images, to recognize a dynamic gesture change of the gesture action object. The gesture recognizing method reduces the processing time as a whole, can quickly obtain gesture recognition results, reduce the system resource occupation, and ensure the real-time gesture interaction.

Inventors

Meili Wang
Yaoyu Lv
Lili Chen
Xue DONG
Hao Zhang
Jiabin Wang
Yangbing Li
MINGDONG WANG
Lei Wang

Assignees

Beijing Boe Technology Development Co., Ltd.
BOE TECHNOLOGY GROUP CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20220419

Claims (20)

1 . A gesture recognizing method, comprising: acquiring a plurality of groups of images taken respectively at different photographing moments for a gesture action object, wherein each group of images comprises at least one pair of corresponding depth map and grayscale map; and according to the plurality of groups of images, obtaining spatial information by using the depth map in each group of images, and obtaining posture information for the gesture action object by using the grayscale map in each group of the images, to recognize a dynamic gesture change of the gesture action object, wherein acquiring the plurality of groups of images taken respectively at different photographing moments for the gesture action object, comprises: obtaining a plurality of groups of images respectively corresponding to the different photographing moments, by using at least one photographing apparatus to continuously photograph the gesture action object, wherein each photographing apparatus is configured to synchronously output a pair of corresponding depth map and grayscale map at one photographing moment, wherein obtaining the plurality of groups of images respectively corresponding to the different photographing moments, by using the at least one photographing apparatus to continuously photograph the gesture action object, comprises: by using each photographing apparatus to continuously photograph the gesture action object, obtaining a plurality of pairs of output depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, wherein each photographing apparatus comprises a first acquiring unit, the first acquiring unit is configured to acquire a grayscale map in every first frame, and acquire a depth map in every N first frames, wherein the depth map is generated based on N grayscale maps acquired in every N consecutive first frames, the N grayscale maps respectively correspond to N different phases, the depth map and one grayscale map among the N grayscale maps are synchronously output from the photographing apparatus, where N is a positive integer greater than 1; by using each photographing apparatus to continuously photograph the gesture action object, obtaining the plurality of pairs of output depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, comprises: outputting one pair of output depth map and grayscale map corresponding to each other in every first frame by using the photographing apparatus, wherein the output depth map is obtained by smooth trajectory fitting and prediction according to the N grayscale maps and the depth map.
2 . The gesture recognizing method according to claim 1 , wherein obtaining the spatial information by using the depth map in each group of images, comprises: determining a gesture region in the depth map according to the depth map, wherein the spatial information comprises the gesture region in the depth map; obtaining the posture information for the gesture action object by using the grayscale map in each group of the images, comprises: determining the posture information for the gesture action object corresponding to each group of images according to the gesture region in the depth map and the grayscale map; and recognizing the dynamic gesture change of the gesture action object, comprises: determining the dynamic gesture change of the gesture action object according to posture information for the gesture action object respectively corresponding to the plurality of groups of images.
3 . The gesture recognizing method according to claim 2 , wherein determining the gesture region in the depth map according to the depth map, comprises: traversing the depth map, and counting depth data in the depth map, to build a depth histogram; selecting an adaptive depth threshold corresponding to the depth map, and determining the gesture region in the depth map according to the adaptive depth threshold and the depth histogram.
4 . The gesture recognizing method according to claim 2 , wherein the posture information for the gesture action object corresponding to each group of images comprises finger state information and position information, determining the posture information for the gesture action object corresponding to each group of images, according to the grayscale map and the gesture region in the depth map, comprises: applying the gesture region in the depth map to the grayscale map to obtain a gesture analysis region in the grayscale map; performing binary processing on the gesture analysis region to obtain a gesture connected domain; performing convex hull detection on the gesture connected domain to obtain the finger state information, wherein the finger state information comprises whether a finger is stretched out or not, and a count of fingers stretched out; and determining the position information based on the depth map, wherein the position information comprises a coordinate position of the gesture action object in a gesture interactive space.
5 . The gesture recognizing method according to claim 4 , wherein determining the dynamic gesture change of the gesture action object according to the posture information for the gesture action object respectively corresponding to the plurality of groups of images, comprises: determining a finger outstretched state change and a position change of the gesture action object during a recognition period composed of the different photographing moments, according to finger state information and position information corresponding to the plurality of groups of images; and determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change corresponding to the plurality of groups of images.
6 . The gesture recognizing method according to claim 5 , wherein the coordinate position comprises a depth coordinate, the dynamic gesture change of the gesture action object comprises a gesture action, determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change, comprises: determining that the gesture action is a click gesture, in response to the finger outstretched state change indicating that at least one finger of the gesture action object is in an outstretched state during at least part of time period of the recognition period, and the position change indicating that a depth coordinate of a target recognition point of the gesture action object decreases first and then increases during the at least part of time period.
7 . The gesture recognizing method according to claim 5 , wherein the coordinate position comprises a depth coordinate, and the dynamic gesture change of the gesture action object comprises a gesture action, determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change, comprises: determining that the gesture action is a long-press gesture, in response to the finger outstretched state change indicating that at least one finger of the gesture action object is in an outstretched state during at least part of time period of the recognition period, and the position change indicating that a depth coordinate of a target recognition point of the gesture action object decreases first and then is maintained during the at least part of time period, and a time length for the maintenance exceeding a first threshold.
8 . The gesture recognizing method according to claim 5 , wherein the dynamic gesture change of the gesture action object comprises a gesture action, determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change, comprises: determining that the gesture action is a slide gesture, in response to the finger outstretched state change indicating that at least one finger of the gesture action object is in an outstretched state during at least part of time period of the recognition period, and the position change indicating that a distance that a target recognition point of the gesture action object slides along a preset direction during the at least part of time period exceeds a second threshold, wherein the distance is calculated based on position information of the target recognition point of the gesture action object in the plurality of groups of images.
9 . The gesture recognizing method according to claim 5 , wherein the dynamic gesture change of the gesture action object comprises a gesture action, determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change, comprises: determining that the gesture action is a grab gesture, in response to the finger outstretched state change indicating that the gesture action object transitions from a state where at least one finger is stretched out to a state where no finger is stretched out during the recognition period; and determining that the gesture action is a release gesture, in response to the finger outstretched state change indicating that the gesture action object transitions from a state where no finger is stretched out to a state where at least one finger is stretched out during the recognition period.
10 . The gesture recognizing method according to claim 1 , wherein a plurality of photographing apparatuses are provided, each group of images comprises a plurality of pairs of corresponding depth maps and grayscale maps, the plurality of pairs of corresponding depth maps and grayscale maps are obtained by synchronously photographing the gesture action object by the plurality of photographing apparatuses at a same photographing moment, and the plurality of pairs of corresponding depth maps and grayscale maps have different photographing angles.
11 . A gesture recognizing method, comprising: acquiring a plurality of groups of images taken respectively at different photographing moments for a gesture action object, wherein each group of images comprises at least one pair of corresponding depth map and grayscale map; and according to the plurality of groups of images, obtaining spatial information by using the depth map in each group of images, and obtaining posture information for the gesture action object by using the grayscale map in each group of the images, to recognize a dynamic gesture change of the gesture action object, wherein acquiring the plurality of groups of images taken respectively at different photographing moments for the gesture action object, comprises: obtaining a plurality of groups of images respectively corresponding to the different photographing moments, by using at least one photographing apparatus to continuously photograph the gesture action object, wherein each photographing apparatus is configured to synchronously output a pair of corresponding depth map and grayscale map at one photographing moment, wherein obtaining the plurality of groups of images respectively corresponding to the different photographing moments, by using the at least one photographing apparatus to continuously photograph the gesture action object, comprises: by using each photographing apparatus to continuously photograph the gesture action object, obtaining a plurality of pairs of output depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, wherein each photographing apparatus comprises a first acquiring unit, the first acquiring unit is configured to acquire a grayscale map in every first frame, and acquire a depth map in every N first frames, wherein the depth map is generated based on N grayscale maps acquired in every N consecutive first frames, the N grayscale maps respectively correspond to N different phases, the depth map and one grayscale map among the N grayscale maps are synchronously output from the photographing apparatus, where N is a positive integer greater than 1; by using each photographing apparatus to continuously photograph the gesture action object, obtaining the plurality of pairs of depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, comprises: outputting a pair of corresponding depth map and grayscale map in at most every N−1 first frames by using the photographing apparatus, wherein for one pair of corresponding depth map and grayscale map output in a same frame, the output depth map is obtained based on grayscale maps of N−1 first frames adjacent to the output grayscale map, and the output grayscale map and the grayscale maps of the N−1 first frames correspond to the N different phases.
12 . The gesture recognizing method according to claim 11 , wherein obtaining the spatial information by using the depth map in each group of images, comprises: determining a gesture region in the depth map according to the depth map, wherein the spatial information comprises the gesture region in the depth map; obtaining the posture information for the gesture action object by using the grayscale map in each group of the images, comprises: determining the posture information for the gesture action object corresponding to each group of images according to the gesture region in the depth map and the grayscale map; and recognizing the dynamic gesture change of the gesture action object, comprises: determining the dynamic gesture change of the gesture action object according to posture information for the gesture action object respectively corresponding to the plurality of groups of images.
13 . The gesture recognizing method according to claim 12 , wherein determining the gesture region in the depth map according to the depth map, comprises: traversing the depth map, and counting depth data in the depth map, to build a depth histogram; selecting an adaptive depth threshold corresponding to the depth map, and determining the gesture region in the depth map according to the adaptive depth threshold and the depth histogram.
14 . The gesture recognizing method according to claim 12 , wherein the posture information for the gesture action object corresponding to each group of images comprises finger state information and position information, determining the posture information for the gesture action object corresponding to each group of images, according to the grayscale map and the gesture region in the depth map, comprises: applying the gesture region in the depth map to the grayscale map to obtain a gesture analysis region in the grayscale map; performing binary processing on the gesture analysis region to obtain a gesture connected domain; performing convex hull detection on the gesture connected domain to obtain the finger state information, wherein the finger state information comprises whether a finger is stretched out or not, and a count of fingers stretched out; and determining the position information based on the depth map, wherein the position information comprises a coordinate position of the gesture action object in a gesture interactive space.
15 . The gesture recognizing method according to claim 14 , wherein determining the dynamic gesture change of the gesture action object according to the posture information for the gesture action object respectively corresponding to the plurality of groups of images, comprises: determining a finger outstretched state change and a position change of the gesture action object during a recognition period composed of the different photographing moments, according to finger state information and position information corresponding to the plurality of groups of images; and determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change corresponding to the plurality of groups of images.
16 . A gesture recognizing method, comprising: acquiring a plurality of groups of images taken respectively at different photographing moments for a gesture action object, wherein each group of images comprises at least one pair of corresponding depth map and grayscale map; and according to the plurality of groups of images, obtaining spatial information by using the depth map in each group of images, and obtaining posture information for the gesture action object by using the grayscale map in each group of the images, to recognize a dynamic gesture change of the gesture action object, wherein acquiring the plurality of groups of images taken respectively at different photographing moments for the gesture action object, comprises: obtaining a plurality of groups of images respectively corresponding to the different photographing moments, by using at least one photographing apparatus to continuously photograph the gesture action object, wherein each photographing apparatus is configured to synchronously output a pair of corresponding depth map and grayscale map at one photographing moment, wherein obtaining the plurality of groups of images respectively corresponding to the different photographing moments, by using the at least one photographing apparatus to continuously photograph the gesture action object, comprises: by using each photographing apparatus to continuously photograph the gesture action object, obtaining a plurality of pairs of output depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, wherein each photographing apparatus comprises a first acquiring unit and a second acquiring unit, the second acquiring unit is configured to output a grayscale map in every second frame, and the first acquiring unit is configured to output a depth map in every M second frames, where M is a positive integer greater than 1, by using each photographing apparatus to continuously photograph the gesture action object, obtaining the plurality of pairs of depth maps and grayscale maps respectively corresponding to the different photographing moments output by the photographing apparatus, comprises: outputting a pair of corresponding depth map and grayscale map in at most every M−1 second frames by using the photographing apparatus, wherein the output depth map comprises a reference depth map, or a depth map obtained by smooth trajectory fitting and prediction based on the reference depth map and at least one grayscale map corresponding to the reference depth map, wherein the reference depth map comprises a depth map output by the first acquiring unit at a current second frame or before the current second frame, the current second frame is a second frame outputting the pair of corresponding depth map and grayscale map, and the at least one grayscale map comprises a grayscale map output by the second acquiring unit between the current second frame and a second frame corresponding to the reference depth map.
17 . The gesture recognizing method according to claim 16 , wherein obtaining the spatial information by using the depth map in each group of images, comprises: determining a gesture region in the depth map according to the depth map, wherein the spatial information comprises the gesture region in the depth map; obtaining the posture information for the gesture action object by using the grayscale map in each group of the images, comprises: determining the posture information for the gesture action object corresponding to each group of images according to the gesture region in the depth map and the grayscale map; and recognizing the dynamic gesture change of the gesture action object, comprises: determining the dynamic gesture change of the gesture action object according to posture information for the gesture action object respectively corresponding to the plurality of groups of images.
18 . The gesture recognizing method according to claim 17 , wherein determining the gesture region in the depth map according to the depth map, comprises: traversing the depth map, and counting depth data in the depth map, to build a depth histogram; selecting an adaptive depth threshold corresponding to the depth map, and determining the gesture region in the depth map according to the adaptive depth threshold and the depth histogram.
19 . The gesture recognizing method according to claim 17 , wherein the posture information for the gesture action object corresponding to each group of images comprises finger state information and position information, determining the posture information for the gesture action object corresponding to each group of images, according to the grayscale map and the gesture region in the depth map, comprises: applying the gesture region in the depth map to the grayscale map to obtain a gesture analysis region in the grayscale map; performing binary processing on the gesture analysis region to obtain a gesture connected domain; performing convex hull detection on the gesture connected domain to obtain the finger state information, wherein the finger state information comprises whether a finger is stretched out or not, and a count of fingers stretched out; and determining the position information based on the depth map, wherein the position information comprises a coordinate position of the gesture action object in a gesture interactive space.
20 . The gesture recognizing method according to claim 19 , wherein determining the dynamic gesture change of the gesture action object according to the posture information for the gesture action object respectively corresponding to the plurality of groups of images, comprises: determining a finger outstretched state change and a position change of the gesture action object during a recognition period composed of the different photographing moments, according to finger state information and position information corresponding to the plurality of groups of images; and determining the dynamic gesture change of the gesture action object, according to the finger outstretched state change and the position change corresponding to the plurality of groups of images.

Description

CROSS-REFERENCE TO RELATED PATENT APPLICATIONS This application is a U.S. National Stage Application under 35 U.S.C. § 371 of International Patent Application No. PCT/CN2022/087576, filed Apr. 19, 2022, which is incorporated by reference in its entirety. TECHNICAL FIELD Embodiments of the present disclosure relate to a gesture recognizing method, an interactive method, a gesture interactive system, an electronic device, and a non-transitory computer-readable storage medium. BACKGROUND With continuous development of naked-eye 3-Dimension (3D) light field display, in order to achieve an effect of stereoscopic 3D display, objects displayed in 3D will have to exit and enter a screen. When people watch 3D display, in order to immersively interact with the content displayed in 3D, it is necessary to provide an interactive system or method that can obtain depth information, and have high precision, low delay, and large field of view, to achieve interaction between a user and the content displayed in 3D. At present, implementing interactive functions based on gesture recognition technology is a major research hotspot, and it has been applied in many fields such as naked-eye 3D displays, VR/AR/MR, vehicles, game entertainment, smart wearables, and industrial design. A core of implementing interactive functions based on gesture recognition technology is to acquire user's gesture information through a sensor device such as a camera, recognize gestures through relevant recognition and classification algorithms, and assign different semantic information to different gestures, so as to implement different interactive functions. SUMMARY At least one embodiment of the present disclosure provides a gesture recognizing method, comprising: acquiring a plurality of groups of images taken respectively at different photographing moments for a gesture action object, wherein each group of images comprises at least one pair of corresponding depth map and grayscale map; and according to the plurality of groups of images, obtaining spatial information by using the depth map in each group of images, and obtaining posture information for the gesture action object by using the grayscale map in each group of the images, to recognize a dynamic gesture change of the gesture action object. For example, in the gesture recognizing method provided by at least one embodiment of the present disclosure, obtaining the spatial information by using the depth map in each group of images, comprises: determining a gesture region in the depth map according to the depth map, wherein the spatial information comprises the gesture region in the depth map; obtaining the posture information for the gesture action object by using the grayscale map in each group of the images, comprises: determining the posture information for the gesture action object corresponding to each group of images according to the gesture region in the depth map and the grayscale map; and recognizing the dynamic gesture change of the gesture action object, comprises: determining the dynamic gesture change of the gesture action object according to posture information for the gesture action object respectively corresponding to the plurality of groups of images. For example, in the gesture recognizing method provided by at least one embodiment of the present disclosure, determining the gesture region in the depth map according to the depth map, comprises: traversing the depth map, and counting depth data in the depth map, to build a depth histogram; selecting an adaptive depth threshold corresponding to the depth map, and determining the gesture region in the depth map according to the adaptive depth threshold and the depth histogram. For example, in the gesture recognizing method provided by at least one embodiment of the present disclosure, the posture information for the gesture action object corresponding to each group of images comprises finger state information and position information, determining the posture information for the gesture action object corresponding to each group of images, according to the grayscale map and the gesture region in the depth map, comprises: applying the gesture region in the depth map to the grayscale map to obtain a gesture analysis region in the grayscale map; performing binary processing on the gesture analysis region to obtain a gesture connected domain; performing convex hull detection on the gesture connected domain to obtain the finger state information, wherein the finger state information comprises whether a finger is stretched out or not, and a count of fingers stretched out; and determining the position information based on the depth map, wherein the position information comprises a coordinate position of the gesture action object in a gesture interactive space. For example, in the gesture recognizing method provided by at least one embodiment of the present disclosure, determining the dynamic gesture change of the gesture action object acco