CN-121996065-A - Gesture body feeling interaction method based on fusion of vision, inertial touch feeling and multi-point touch information

CN121996065ACN 121996065 ACN121996065 ACN 121996065ACN-121996065-A

Abstract

The application belongs to the technical field of gesture somatosensory interaction, and particularly relates to a gesture somatosensory interaction method based on fusion of vision, inertial touch and multi-point touch information, which combines edge detection with a three-frame differential algorithm, so that an obtained image has good contour information and makes up for part of 'cavity' phenomenon; meanwhile, an improved three-frame difference method is supplemented by fusing a Gaussian mixture background difference algorithm, so that the target image has continuous contours and complete internal information, the gesture recognition rate is improved, and the false detection rate is reduced.

Inventors

XU PEIMING
CHEN JIE
TIAN XUEWEN
SUN WEI
MAO MIN

Assignees

山东体育学院

Dates

Publication Date: 20260508
Application Date: 20251219

Claims (7)

1. The gesture body feeling interaction method based on fusion of vision, inertial touch feeling and multi-point touch information is characterized by comprising the following steps of: step one, acquiring video image sequence data of hands; Step two, processing the acquired video image sequence data to obtain a target image; And thirdly, performing gesture recognition and classification on the obtained target image based on the generated countermeasure network model, sending recognition and classification results to a man-machine interaction module, displaying the received video image sequence data by a man-machine interaction interface, performing text display based on the received recognition result, and synthesizing voice.
2. The method for interacting with the gesture body sense based on the fusion of the visual sense, the inertial touch sense and the multi-touch information according to claim 1, wherein the second step comprises: obtaining K-1, K, K +1 frames of video images in video image sequence data, and carrying out graying and median filtering treatment on the obtained K-1, K, K +1 frames of video images to respectively obtain video images 、、 ; Carrying out background difference Gaussian mixture model on K+1st frame video image to obtain foreground target image ; For video images And video image Performing differential processing to obtain differential images At the same time for video images And video image Performing differential processing to obtain differential images ; For differential images Carrying out Canny edge detection to obtain an image At the same time to differential image And Performing logical AND operation to obtain an image ; Image is formed And an image Performing logic OR operation to obtain image ; Image is formed And an image And performing logical AND operation to obtain a target image res with clear outline and internal information.
3. The method of claim 1, wherein the generating an countermeasure network model comprises an encoder, a decoder, a discriminator, and a classifier; the encoder is used for inputting target images Compression mapping to hidden space to extract original features of target image ; The decoder is used for splicing the original characteristic z of the target image and the one-hot code of the label to be used as input, decoding and outputting the decoded image ; The discriminator is used for comparing the target image And decoded image Inputting the result into a discriminator for discrimination, and outputting the result into any probability value of 0-1 through a Sigmoid layer; The classifier resolves the target image The class whose output represents the probability of each class is indexed by the Argmax function by its output maximum value, and the corresponding class can be determined.
4. The method for interacting gesture and body sense based on fusion of visual sense, inertial touch sense and multi-touch information according to claim 3, wherein the encoder comprises four layers of 3 x 3, a convolution layer with a step length of 2, two InceptionV structures and one full connection layer, and CBN layers and Mish activation functions are arranged in the convolution layer to normalize output and enhance network expression capability.
5. The method of claim 3, wherein the decoder comprises a full-connection layer, a four-layer transposed convolutional network, and a two-layer InceptionV-trans network, wherein three of the four-layer transposed convolutional network uses Mish activation functions, one of the four layers uses Tanh activation functions, and CBN processing is performed before using the activation functions to prevent overfitting.
6. The method for interacting gesture and body sense based on fusion of visual sense, inertial touch sense and multi-touch information according to claim 3, wherein the discriminator comprises four convolution layers and two full-connection layers, the convolution layer activation function uses Mish, and the full-connection layers use Sigmoid.
7. A computer-readable storage medium including a stored program, wherein the power device in which the computer-readable storage medium is controlled to execute the gesture somatosensory interaction method based on fusion visual, inertial touch and multi-touch information according to any one of claims 1 to 6 when the program is run.

Description

Gesture body feeling interaction method based on fusion of vision, inertial touch feeling and multi-point touch information Technical Field The application belongs to the technical field of gesture somatosensory interaction, and particularly relates to a gesture somatosensory interaction method based on fusion of vision, inertial touch and multi-point touch information. Background The integration and development of new generation information technology, artificial intelligence and advanced manufacturing technology make man-machine interaction an important component for developing intelligent equipment and intelligent products. In recent years, natural man-machine interaction technologies such as voice, gestures, postures, expressions and the like enable a computing system to have the characteristics of strong perceptibility, multi-channel capability, naturalness and the like, replace a mouse and a keyboard, and become interaction media for mutual communication and communication, decision making and execution between people and machines. In a man-machine interaction scene, a man-machine interaction mechanism established by a gesture recognition technology brings more flexible and efficient experience, but in the gesture recognition process, the obtained image contour information is often lost, cavity display occurs, so that the recognition rate of gestures is high, and the false detection degree is high. Disclosure of Invention In order to solve the technical problems in the background art, the invention provides a gesture body feeling interaction method based on fusion of visual sense, inertial touch sense and multi-point touch information, so that the obtained images with continuous outlines and complete internal information can be obtained, the recognition rate of gestures is improved, and the false detection rate is reduced. In a first aspect, the present invention provides a gesture somatosensory interaction method based on fusion of visual sense, inertial touch sense and multi-touch information, including: step one, acquiring video image sequence data of hands; Step two, processing the acquired video image sequence data to obtain a target image; And thirdly, performing gesture recognition and classification on the obtained target image based on the generated countermeasure network model, sending recognition and classification results to a man-machine interaction module, displaying the received video image sequence data by a man-machine interaction interface, performing text display based on the received recognition result, and synthesizing voice. Further, the second step includes: obtaining K-1, K, K +1 frames of video images in video image sequence data, and carrying out graying and median filtering treatment on the obtained K-1, K, K +1 frames of video images to respectively obtain video images 、、; Carrying out background difference Gaussian mixture model on K+1st frame video image to obtain foreground target image; For video imagesAnd video imagePerforming differential processing to obtain differential imagesAt the same time for video imagesAnd video imagePerforming differential processing to obtain differential images; For differential imagesCarrying out Canny edge detection to obtain an imageAt the same time to differential imageAndPerforming logical AND operation to obtain an image; Image is formedAnd an imagePerforming logic OR operation to obtain image; Image is formedAnd an imageAnd performing logical AND operation to obtain a target image res with clear outline and internal information. Further, the generating an countermeasure network model includes an encoder, a decoder, a discriminator, and a classifier; the encoder is used for inputting target images Compression mapping to hidden space to extract original features of target image; The decoder is used for splicing the original characteristic z of the target image and the one-hot code of the label to be used as input, decoding and outputting the decoded image; The discriminator is used for comparing the target imageAnd decoded imageInputting the result into a discriminator for discrimination, and outputting the result into any probability value of 0-1 through a Sigmoid layer; The classifier resolves the target image The class whose output represents the probability of each class is indexed by the Argmax function by its output maximum value, and the corresponding class can be determined. Further, the encoder comprises four layers of 3×3, a convolution layer with a step length of 2, a two-layer InceptionV structure and a full connection layer, wherein a CBN layer and a Mish activation function are arranged in the convolution layer to normalize output and enhance network expression capability. Further, the decoder includes a fully connected layer, a four layer transposed convolutional network, and a two layer InceptionV-trans network, wherein three of the four layer transposed convolutional networks use Mish activation functions, one of the la