CN-121997045-A - Virtual data generation and classification method for few-sample ultrasonic gesture recognition

CN121997045ACN 121997045 ACN121997045 ACN 121997045ACN-121997045-A

Abstract

The invention discloses a virtual data generation and classification method for few-sample ultrasonic gesture recognition. According to the invention, by deeply fusing the ultrasonic channel sensing and the conditional diffusion generation model, stable, reliable and well-generalized gesture recognition is realized on the premise of no need of large-scale real user data acquisition, the dependence problem of the existing ultrasonic gesture recognition technology on a large number of user samples is effectively relieved, the robustness and adaptability of the model under complex practical application conditions such as cross-user, cross-gesture speed, cross-interaction distance and the like are remarkably improved, the practicability and expandability of the ultrasonic interaction technology of the mobile equipment end are improved, and the method has higher engineering application value and industrial popularization potential. The method and the device remarkably reduce data acquisition time and calculation cost, promote user experience, realize rapid adaptation of new users, new gestures and new use scenes, and provide powerful support for popularization and application of the ultrasonic gesture recognition technology in large-scale mobile terminals.

Inventors

CHEN JIATONG
LIU JIANWEI
Yao xinwei
HAN JINSONG

Assignees

浙江大学

Dates

Publication Date: 20260508
Application Date: 20260106

Claims (9)

1. A virtual data generation and classification method for few-sample ultrasonic gesture recognition, comprising the steps of: Customization stage The method comprises the steps of collecting signals, namely, a user makes a specific gesture in a certain range of data collecting equipment, and collecting ultrasonic signals containing gesture characteristics of the user by using a loudspeaker and a microphone of the equipment; Signal preprocessing, namely extracting differential channel impulse response characteristics of a training set through channel impulse response characteristic extraction, range truncation and differential calculation based on an ultrasonic signal containing a user gesture; The condition coding is that based on the ordered pairs of the impulse response characteristics of the differential channel of the training set, the multidimensional condition vector of the training set is calculated through a speed coding and distance coding algorithm; training a diffusion model, namely training a schematic drawing condition diffusion model CIR-ShiftNet specially designed for ultrasonic signals based on ordered pairs of impulse response characteristics of a differential channel of a training set and multidimensional condition vectors of the training set; Data enhancement phase The method comprises the steps of collecting signals, namely, a user makes a specific gesture in a certain range of data collecting equipment, and collecting ultrasonic signals containing gesture characteristics of the user by using a loudspeaker and a microphone of the equipment; Signal preprocessing, namely extracting differential channel impulse response characteristics of a user through channel impulse response characteristic extraction, range truncation and differential calculation based on an ultrasonic signal containing a user gesture; generating virtual data, namely generating virtual differential channel impulse response characteristic data based on a training set multidimensional condition vector, a graphical map condition diffusion model CIR-ShiftNet and user differential channel impulse response characteristics; And gesture recognition, namely selecting different models such as a convolutional neural network, a long-term and short-term memory network and the like according to user requirements based on the user differential channel impulse response characteristics and the virtual differential channel impulse response characteristics, and training a classification model for specific gesture recognition tasks.
2. The method for generating and classifying virtual data for recognition of ultrasonic gestures with few samples according to claim 1, wherein the channel impulse response feature extraction method in the signal preprocessing step of the customization phase and the data enhancement phase uses a low-wave filter to perform filtering so as to control the response distance, and the cut-off frequency is 150-250Hz.
3. The method for generating and classifying virtual data for recognition of ultrasonic gestures with few samples according to claim 1, wherein the range cut-off method in the signal preprocessing step of the customization stage and the data enhancement stage is to reserve an effective distance interval, and is specifically realized to reserve the 10 th to 40 th bins.
4. A method for generating and classifying virtual data for recognition of ultrasonic gestures with few samples according to claim 1,2 or 3, wherein the response calculation method in the signal preprocessing step in the customization phase and the data enhancement phase is to calculate the frame-by-frame difference of the channel impulse response feature vector, so as to obtain a differential channel impulse response feature for describing the micro-path dynamic change caused by the gestures.
5. The method of claim 4, wherein the speed encoding algorithm in the custom stage condition encoding step is a training set differential channel impulse response feature ordered pair-by-frame wasperstein distance, and the distance encoding algorithm is a training set differential channel impulse response feature ordered pair-by-frame centroid difference.
6. The method for generating and classifying virtual data for recognition of ultrasonic gestures with few samples according to claim 5, wherein the graphical conditional diffusion model CIR-ShiftNet is a complex domain micromodel, and the condition injection and back propagation training is realized by a complex attention mechanism.
7. The method for generating and classifying virtual data for recognizing ultrasonic gestures with fewer samples according to claim 6, wherein the graphical conditional diffusion model CIR-ShiftNet is trained by using the ordered pairs of training set differential channel impulse response characteristics as the start point and the end point of the diffusion process, respectively, and using the multidimensional condition vectors corresponding to the ordered pairs of training set differential channel impulse response characteristics as the condition vectors of the diffusion process.
8. The method for generating and classifying virtual data for recognition of a few-sample ultrasonic gesture according to claim 1 or 2 or 3 or 5 or 6 or 7, wherein the generating method in the virtual data generating step comprises the steps of: based on the user differential channel impulse response characteristics, adding noise disturbance forward to obtain the user differential channel impulse response characteristics added with the noise disturbance; Based on the user differential channel impulse response characteristic added with noise disturbance, the training set multidimensional condition vector and the graphical map conditional diffusion model CIR-ShiftNet, the virtual differential channel impulse response characteristic is generated by utilizing the inverse denoising process of the diffusion model.
9. The method for generating and classifying virtual data for low-sample ultrasonic gesture recognition according to claim 8, wherein the training gesture recognition model flow of the gesture recognition step comprises: performing initialization learning by using a user differential channel impulse response characteristic sample to obtain an initialized learned gesture recognition model; Based on the initially learned gesture recognition model and the virtual differential channel impulse response characteristics, a lightweight or depth model is selected according to user device computing power, and the training distribution is expanded using the generated virtual samples.

Description

Virtual data generation and classification method for few-sample ultrasonic gesture recognition Technical Field The invention belongs to the field of user authentication, and particularly relates to a reliable gesture recognition method with few samples based on ultrasonic signals by using a loudspeaker and a microphone of equipment to collect ultrasonic signals and using a conditional diffusion model to generate virtual samples. Background Gesture interaction technology is widely applied to smart phones, wearable devices, smart home and other scenes in recent years, and can be used for controlling various interaction modes such as interfaces, auxiliary input, touchless operation and the like. The existing gesture recognition method can be mainly divided into gesture recognition based on visual signals and gesture recognition based on wireless signals, and the two technologies have respective limitations. Vision-based gesture recognition techniques typically rely on cameras to capture images or videos of a user's movements, and to classify gestures through a convolutional neural network or a transducer, or other visual model. The method is convenient to use, but has the obvious defects that the camera is greatly influenced by illumination and environmental shielding, is difficult to stably work in dark or complex environments, and meanwhile, the visual data is easy to reveal the privacy of a user and is not suitable for a scene with sensitive privacy or low power consumption. Gesture recognition technology based on wireless signals utilizes WiFi, millimeter waves, radar or ultrasonic signals to sense gesture actions, and classification is achieved by analyzing signal reflection changes. The wireless sensing method has the advantages of independent illumination, higher privacy security and the like. However, many wireless technologies are costly in hardware, power consuming, or difficult to deploy on mobile devices, such that their wide range of applications is limited. In order to achieve low power consumption, a gesture recognition capability that can be deployed on a mobile device while having a high privacy protection capability, researchers have come to pay attention to gesture sensing using existing ultrasound hardware in a smartphone or headset. The existing ultrasonic gesture recognition method generally needs a large amount of user data to train so as to keep better recognition performance under the conditions of different users, different speeds or different distances. However, in practical application, obtaining a large number of user samples is time-consuming and impractical, resulting in difficulty in quickly adapting the recognition model to a new user, and weak generalization capability. In addition, ultrasonic multipath reflections can produce unstable channel fluctuations with changes in gesture speed and hand position, such that recognition accuracy under low sample conditions is limited. With the continuous development of mobile equipment hardware, the existing smart phones and headsets are widely provided with high-quality speakers and microphones, and can stably transmit and receive high-frequency ultrasonic signals. Meanwhile, a depth generation model technology (such as a diffusion model) in recent years can generate virtual training data meeting a physical rule under a low sample condition, and a new way is provided for enhancing the generalization capability of a gesture recognition model. Therefore, the existing ultrasonic hardware of the mobile device is combined with the depth generation model to automatically expand the user training sample, so that a gesture recognition technology capable of achieving a high-precision recognition effect without a large amount of user data is possible. The invention utilizes a device speaker to emit a constant ultrasonic signal and collects ultrasonic multipath changes caused by gestures through a microphone to generate a differential channel impulse response (DIFFERENTIAL CHANNEL impulse response, diffCIR) as a stable gesture dynamic characteristic. On the basis, a condition vector is formed through speed coding and distance coding, and a virtual sample consistent with the real physical characteristics is generated by using a condition diffusion model, so that the gesture recognition performance under the condition of few samples is improved. The invention provides a reliable gesture recognition method with few samples based on ultrasonic signals, which can realize natural, low-power consumption and privacy-friendly gesture interaction on mobile equipment and wearable equipment. Disclosure of Invention The beneficial effects of the invention are as follows: According to the invention, a complete technical scheme of ultrasonic gesture recognition for few-sample scenes is provided from a system level, and by means of deep fusion of ultrasonic channel perception and condition diffusion generation models, stable and reliable gesture recognition with