Search

US-20260126850-A1 - METHODS AND SYSTEMS FOR HAND MICRO-GESTURE RECOGNITION FOR A VISUAL SEE THROUGH DEVICE

US20260126850A1US 20260126850 A1US20260126850 A1US 20260126850A1US-20260126850-A1

Abstract

Methods and systems for hand micro-gesture recognition for a visual see through (VST) device are provided. The system includes a hand velocity estimation module configured to determine a hand velocity of a movement of a hand of a user using hand images being captured by the VST device, a jitter module configured to determine an average jitter associated with the movement of the hand, an upscaling module configured to determine, based on the hand velocity and the average jitter, an upscaling factor for generating high-resolution hand images corresponding to the captured one or more hand images, a key-point module configured to measure, using the generated high-resolution hand images, a movement of a plurality of hand key-points associated with the hand, and a gesture recognition module configured to recognize, based on a comparison of the measured movement of the plurality of hand key-points and the determined average jitter, the movement of the plurality of hand key-points as a hand micro-gesture.

Inventors

  • Vishakha SETTISARA RATNAKAR
  • Green Rosh KUMBALAPARAMBIL SREEDHARAN
  • Pawan Prasad BINDIGAN HARIPRASANNA
  • Meghana Shankar
  • Sungsoo Choi
  • Hyuntaek WOO

Assignees

  • SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date
20260507
Application Date
20251103
Priority Date
20241105

Claims (20)

  1. 1 . A system for hand micro-gesture recognition for a visual see through (VST) device, the system comprising: a hand velocity estimation module configured to determine a hand velocity of a movement of a hand of a user using one or more hand images being captured by the VST device; a jitter module configured to determine an average jitter associated with the movement of the hand in the one or more hand images; an upscaling module configured to determine, based on the hand velocity and the average jitter, an upscaling factor for generating high-resolution hand images corresponding to the captured one or more hand images; a key-point module configured to measure, using the generated high-resolution hand images, a movement of a plurality of hand key-points associated with the hand; and a gesture recognition module configured to recognize, based on a comparison of the measured movement of the plurality of hand key-points and the determined average jitter, the movement of the plurality of hand key-points as a hand micro-gesture.
  2. 2 . The system of claim 1 , wherein the gesture recognition module is further configured to recognize the movement of the plurality of hand key-points as the hand micro-gesture when the measured movement of the plurality of hand key-points is more than the determined average jitter.
  3. 3 . The system of claim 1 , wherein the gesture recognition module is further configured to identify, based upon a comparison with a set of pre-defined recognized hand micro-gestures, the measured movement as a recognized hand micro-gesture.
  4. 4 . The system of claim 1 , wherein the system comprises a super resolution module configured to increase the resolution, based on the upscaling factor, of only a subset of images from the captured hand images.
  5. 5 . The system of claim 4 , wherein the system comprises a cropping module configured to crop the captured hand images to generate cropped hand images.
  6. 6 . The system of claim 1 , wherein the hand velocity estimation module is further configured to: determine, using a machine learning (ML) model capable of predicting locations of the plurality of hand key-points, a displacement for each of the plurality of hand key-points in the captured hand images, correlate, for each of the plurality of hand key-points, the displacement of the hand key-point and a frame-rate of the captured hand images for determining a key-point velocity, and determine an average of the key-point velocities of the plurality of hand key-points as the hand velocity.
  7. 7 . The system of claim 1 , wherein the jitter module is further configured to: perform a coarse hand key-point estimation, using an ML model, for the plurality of hand key-points for predicting a trajectory of the hand key-points, estimate, for each of the plurality of hand key-points, a velocity and an acceleration by correlating a displacement of the coarse hand key-points in a plurality of frames of the captured hand images and a frame-rate of the captured hand images, estimate a trajectory, for each of the plurality of hand key-points, based on the estimated velocity, the estimated acceleration and the coarse hand key-point estimation, and calculate the average jitter based on the estimated trajectory and the predicted trajectory.
  8. 8 . The system of claim 7 , wherein the jitter module is further configured to detect a coarse hand key-point location for each of the plurality of hand key-points using an ML model.
  9. 9 . The system of claim 7 , wherein the jitter module is further configured to correlate: the displacement of the coarse hand key-points between a current frame and a previous frame of the plurality of frames; and a time elapsed between the capturing of the current frame and the previous frame by the VST device.
  10. 10 . The system of claim 7 , wherein the jitter module is further configured to: calculate, for each of the plurality of hand key-points, a trajectory difference value by comparing the estimated trajectory for each key-point and the predicted trajectory for each key-point, and calculate an average of the trajectory difference values, for the plurality of hand key-points, as the average jitter.
  11. 11 . The system of claim 1 , wherein the upscaling module is further configured to take a minima from a set consisting of: a first value obtained by correlating the average jitter and the hand velocity; and a second value obtained by correlating a predefined input size of the captured hand images to an ML model and a size of cropped hand images of the captured hand images.
  12. 12 . The system of claim 1 , wherein the upscaling module is further configured to generate the high-resolution hand images based on a comparison between a current upscaling factor for a current frame and a previous upscaling factor of a previous frame of the captured hand images.
  13. 13 . The system of claim 1 , wherein the upscaling module is further configured to: compare a current upscaling factor of a current frame of the captured hand images with a previous upscaling factor of a previous frame of the captured hand images, perform, based upon the comparison, image subtraction between the current frame and the previous frame to obtain a difference portion image, apply the previous upscaling factor to the difference portion image for increasing a resolution of the difference portion image using super resolution, and blend the increased resolution difference portion image with the previous frame to generate the high-resolution hand image.
  14. 14 . A method for hand micro-gesture recognition for a visual see through (VST) device, the method comprising: determining a hand velocity of a movement of a hand of a user using one or more hand images being captured by the VST device; determining an average jitter associated with the movement of the hand in the one or more hand images; determining, based on the hand velocity and the average jitter, an upscaling factor to generate high-resolution hand images corresponding to the captured one or more hand images; generating the high-resolution hand images by increasing, as per the upscaling factor, a resolution of the corresponding captured one or more hand images; measuring, using the generated high-resolution hand images, a movement of a plurality of hand key-points associated with the hand; and recognizing, based on a comparison of the measured movement of the plurality of hand key-points and the determined average jitter, the movement of the plurality of hand key-points as a hand micro-gesture.
  15. 15 . The method of claim 14 , further comprising: recognizing the movement of the plurality of hand key-points as the hand micro-gesture when the measured movement of the plurality of hand key-points is more than the determined average jitter.
  16. 16 . The method of claim 14 , further comprising: identifying, based upon a comparison with a set of pre-defined recognized hand micro-gestures, the measured movement as a recognized hand micro-gesture.
  17. 17 . The method of claim 14 , further comprising: increasing the resolution, based on the upscaling factor, of only a subset of images from the captured hand images.
  18. 18 . The method of claim 17 , further comprising: cropping the captured hand images to generate cropped hand images.
  19. 19 . One or more non-transitory computer readable storage media storing one or more computer programs including computer-executable instructions that, when executed individually or collectively by a processor of a visual see through (VST) device for hand micro-gesture recognition, cause the VST device to perform operations, the operations comprising: determining a hand velocity of a movement of a hand of a user using one or more hand images being captured by the VST device; determining an average jitter associated with the movement of the hand in the one or more hand images; determining, based on the hand velocity and the average jitter, an upscaling factor to generate high-resolution hand images corresponding to the captured one or more hand images; generating the high-resolution hand images by increasing, as per the upscaling factor, a resolution of the corresponding captured one or more hand images; measuring, using the generated high-resolution hand images, a movement of a plurality of hand key-points associated with the hand; and recognizing, based on a comparison of the measured movement of the plurality of hand key-points and the determined average jitter, the movement of the plurality of hand key-points as a hand micro-gesture.
  20. 20 . The one or more non-transitory computer-readable storage media of claim 19 , the operations further comprising: recognizing the movement of the plurality of hand key-points as the hand micro-gesture when the measured movement of the plurality of hand key-points is more than the determined average jitter.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S) This application is a continuation application, claiming priority under 35 U.S.C. § 365(c), of an International application No. PCT/KR2025/013012, filed on Aug. 26, 2025, which is based on and claims the benefit of an Indian Patent Application number 202441084666, filed on Nov. 5, 2024, in the Indian Patent Office, the disclosure of which is incorporated by reference herein in its entirety. BACKGROUND 1. Field The disclosure relates to the field of Visual See Through (VST) devices. More particularly, the disclosure relates to a method and a system for micro-gesture recognition in VST devices. 2. Description of Related Art A visual see through (VST) device is an electronic display device that allows the user to see what is shown on the screen while still being able to see through the screen. Examples of VST devices include head-up displays, augmented reality systems, and the like. The VST device may be a head mounted display (HMD) device. The VST device may be mounted on a user's forehead covering the eyes of the user. The VST device includes a display/digital screen between the real world and the eyes of the user. The screen is a see-through screen and may be placed very close to the eyes of the user as shown in FIG. 1, according to the related art. FIG. 1 illustrates a scenario depicting a real-world scene being captured using a visual see through (VST) device according to the related art. Referring to FIG. 1, a scenario 100 depicting a real-world scene 100S being captured using a visual see through (VST) device 150 is illustrated. The real-world scene 100S may be captured in the form of images or series of images, and the like and rendered on a screen of the device 150. The VST device 150 gives viewers a more immersive viewing experience via a pass-through mode of the VST device 150. In the pass-through mode, the user is able to see the real world in real-time while wearing the VST device 150. For a delightful user experience, the pass-through mode of the VST device 150 should be able to mimic the pair of human eyes as closely as possible. To realize the pass-through mode, the VST device 150 has a transparent display and includes a pair of cameras depicting each eye of the pair of eyes of a human being. The two cameras capture a scene of the real-world and project the scene on the transparent display of the VST device 150 in real-time. The pass-through mode of the VST device 150 may be enabled in various scenarios, such as a mixed reality scenario. In the mixed reality scenario, the attention of the user is more focused on the virtual content. The pass-through mode may be enabled during an augmented reality (AR) scenario, wherein the user has his/her full attention on the AR content. For interacting with the VST device 150, the user may need to input certain commands into the VST device 150 and may use hand gestures for the same. Hand gestures are the primary mode of interaction while using the HMDs. These hand gestures with minimal hand movements are called micro-gestures. Examples of micro-gestures include pinching/closing fingers, rotating fingers clockwise, snapping fingers, opening fingers, rotating fingers anti-clockwise, and the like. FIG. 2 illustrates micro-gestures according to the related art. Referring to FIG. 2, the HMDs should be able to detect and recognize these micro-gestures with accuracy. However, the range of motion for the micro-gestures is very small. Typically, a jitter in the hand or the tracking device/system hinders an accurate detection of the micro-gesture. There have been attempts to provide for methods which try to overcome the problem of jitter while detecting micro-gestures by comparing a movement of the hand in the images of the hand to a predefined movement within a fixed period of time. Another current method includes cropping the images of the hand and applying fixed tolerances to segregate jitter from actual micro-gestures. Such methods use fixed time frame and fixed movement of the hand for comparison and are not able to detect the micro gestures accurately. The above information is presented as background information only to assist with an understanding of the disclosure. No determination has been made, and no assertion is made, as to whether any of the above might be applicable as prior art with regard to the disclosure. SUMMARY Aspects of the disclosure are to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the disclosure is to provide a method and a system for micro-gesture recognition in VST devices. Additional aspects will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the presented embodiments. In accordance with an aspect of the disclosure, a method for hand micro-gesture recognition for a visual see through (VST) device is prov