CN-121999778-A - Silence voice interaction system based on multimode biological signal perception

CN121999778ACN 121999778 ACN121999778 ACN 121999778ACN-121999778-A

Abstract

The invention discloses a silent voice interaction system based on multi-mode biological signal perception, which relates to the technical field of silent voice interaction and comprises the following steps of establishing an interaction word database, collecting myoelectric signals and vibration signals of a plurality of testers when the testers are in default state, constructing corresponding word recognition models to obtain universal word recognition models with different sexes, setting test words, collecting myoelectric signals and vibration signals of the users when the testers are in default state, screening testers similar to the users from the testers to obtain corresponding word signal sample data, training to obtain an individual word recognition model, and recognizing words of the users when the users are in silent voice interaction.

Inventors

ZHANG FABAO
ZHANG YUZHENG

Assignees

梅斯有限责任公司

Dates

Publication Date: 20260508
Application Date: 20260324

Claims (10)

1. The silent voice interaction system based on multi-mode biological signal perception is characterized by comprising a sample acquisition module, a general construction module, a similarity analysis module and a personality interaction module; The sample acquisition module is used for establishing an interactive word database and acquiring electromyographic signals and vibration signals of a plurality of testers when the testers default interactive words to obtain word signal sample data of different sexes; the general construction module constructs a corresponding word recognition model according to the word signal sample data of different sexes to obtain general word recognition models of different sexes; The similarity analysis module comprises a signal acquisition unit and a similarity screening unit, wherein the signal acquisition unit is used for setting test words and acquiring myoelectric signals and vibration signals of a user when the user defaults the test words, and the similarity screening unit is used for screening testers similar to the user from testers, acquiring corresponding word signal sample data and obtaining individual signal sample data; the personality interaction module trains by using the personality signal sample data to obtain a personality word recognition model, and recognizes words of the user's default when silent voice interaction is performed based on the personality word recognition model.
2. The silent voice interactive system based on multi-modal biological signal sensing according to claim 1, wherein the sample acquisition module configures a sample acquisition strategy comprising: setting words required for silent voice interaction, establishing a corresponding word set, recording as an interaction word database, and recording words in the interaction word database as interaction words; Selecting a plurality of male testers and a plurality of female testers, marking any one tester as a first tester, integrating an electromyographic signal sensor and a vibration signal sensor on the same patch to be marked as a signal acquisition patch, and selecting a part from the lower part of the throat or the lower jaw as a signal acquisition part.
3. The multi-modal biological signal perception based silent voice interaction system as in claim 2 wherein the sample collection strategy further comprises: for a first tester, the signal acquisition patch is tightly attached to the signal acquisition part of the first tester, and the first tester is positioned in a quiet and interference-free environment; Allowing a first tester to defaults each interactive word in sequence, synchronously collecting electromyographic signals and vibration signals corresponding to each interactive word with the same sampling frequency while defaulting each interactive word; And repeatedly acquiring the word signal data of all testers under the same environment, dividing and combining according to the gender of the testers to respectively obtain the word signal sample data of men and the word signal sample data of women.
4. A muted speech interaction system based on multimodal bio-signal perception according to claim 3, wherein the generic construction module is configured with a generic construction strategy comprising: Respectively carrying out baseline correction on the myoelectric signals and vibration signals which are synchronously acquired at any time in the first sample data, respectively carrying out filtering treatment and normalization treatment to obtain corresponding myoelectric standard signals and vibration standard signals; Repeating the processing of all the synchronously collected electromyographic signals and vibration signals in the first sample data to obtain male word signal standard data after completion, and repeating the obtaining of female word signal standard data.
5. The multi-modal biological signal-aware silent voice interaction system of claim 4, wherein the generic build strategy further comprises: the method comprises the steps of recording a transducer model as an initial recognition model, setting the input of the initial recognition model as a myoelectricity standard signal and a vibration standard signal which are synchronously collected, and outputting the myoelectricity standard signal and the vibration standard signal as corresponding interactive words; and respectively carrying out model training on the initial recognition model by utilizing the female word signal standard data and the male word signal standard data, and respectively obtaining a female general word recognition model and a male general word recognition model after the model training is completed.
6. The silent speech interaction system based on multi-modal biological signal perception according to claim 5, wherein the signal acquisition unit is configured with a signal acquisition strategy comprising: Selecting a plurality of words from the interactive word database, and recording the words as test words to obtain a test word set; In a quiet and interference-free environment, enabling a first user to defaults each test word in sequence, collecting corresponding myoelectric signals and vibration signals by using a signal collecting patch, obtaining corresponding myoelectric standard signals and vibration standard signals, and obtaining test signal data of the first user after completion; And for each sex tester, acquiring myoelectricity standard signals and vibration standard signals corresponding to each test word by self defaulting from word signal standard data of the corresponding sex, classifying according to the belonging sex tester, and obtaining test signal data of each sex tester.
7. The silent voice interactive system according to claim 6, wherein the similarity screening unit is configured with a similarity screening policy comprising: Marking any homopolar tester as a first tester, marking any test word as a first test word, marking the myoelectricity standard signal and the vibration standard signal of the first test word as a first myoelectricity signal and a first vibration signal respectively; Equally dividing the first electromyographic signals into k1 continuous time windows, calculating root mean square energy of the first electromyographic signals in each time window, marking the root mean square energy as window energy, obtaining a root mean square energy sequence, sequentially calculating the change rate of adjacent window energy, obtaining k1-1 energy change rates, combining the k1 energy change rates into vectors, marking the vectors as change rate vectors, and marking the vectors as characteristic 1, wherein k1 is the set number; And performing fast Fourier transform on the first electromyographic signal to obtain a corresponding power spectrum, dividing the corresponding frequency band into three sub-frequency bands, respectively calculating the proportion of energy of the three sub-frequency bands to total energy, marking the proportion as an energy duty ratio, combining the three energy duty ratios into a vector, marking the vector as an energy distribution vector, and marking the vector as the characteristic 2.
8. The multi-modal biological signal-aware silent voice interaction system of claim 7, wherein the similarity screening strategy further comprises: Performing Hilbert transformation on the first vibration signal, extracting a smooth signal envelope, obtaining the number A1 of peaks of the signal envelope, the standard deviation A2 of adjacent peak intervals, and the ratio A3 of the average slope of the rising edge and the average slope of the falling edge of the envelope, combining the A1, the A2 and the A3 into vectors, marking the vectors as envelope form vectors, and marking the vectors as characteristics 3; The method comprises the steps of performing short-time Fourier transform on a first vibration signal to obtain a corresponding time spectrum, finding out frequencies of first two formants with maximum energy on the time spectrum, sequentially marking the frequencies as a first formant F1 and a second formant F2 according to a sequence from big to small, calculating a frequency ratio corresponding to F2 and F1, marking the frequency ratio as B1, calculating an energy ratio corresponding to F2 and F1, marking the energy ratio as B2, combining the B1 and the B2 into a vector, marking the vector as a formant characteristic vector, and marking the vector as a characteristic 4; And repeatedly acquiring the features 1 to 4 corresponding to the first test word of the first tester.
9. The multi-modal biological signal aware-based silent voice interaction system of claim 8, wherein the similarity screening strategy further comprises: for a first user and a first tester, respectively calculating cosine similarity of the corresponding feature 1, the feature 2 and the feature 3, and respectively marking the cosine similarity as feature similarity E1, the cosine similarity E2 and the cosine similarity E3 in sequence, and calculating normalized Euclidean distance of the corresponding feature 4, and marking the normalized Euclidean distance as feature similarity E4; Calculating the average value of E1, E2, E3 and E4, recording as the word similarity of the first user and the first tester in the first test word, repeatedly calculating the word similarity of the first user and the first tester in all the test words to obtain a similarity set of the first tester, calculating the average value of the similarity set, recording as the average similarity, repeatedly obtaining the similarity set of all the homomorphism testers and the corresponding average similarity; For the first test word, arranging word similarity of all homoplasmic testers in the first test word in a sequence from large to small, counting the first k2 homoplasmic testers, and recording as a front tester collection of the first test word, wherein k2 is the set number; counting the times of occurrence of the first tester in the front tester aggregate of all the test words, marking the times as the front times, dividing the times by the total number of the test words, and marking the times as the front overlapping rate of the first tester; Repeatedly obtaining the front overlap rate of all the homoplasmic testers, respectively carrying out normalization treatment on the front overlap rate and the average homogeneity of all the homoplasmic testers, and then calculating the average value of the front overlap rate and the average homogeneity of each homoplasmic tester after normalization to obtain the comprehensive similarity score of each homoplasmic tester; And extracting the corresponding parts of the similar testers from the word signal sample data of the corresponding gender, and recording the parts as the personalized signal sample data, wherein k3 is the set number.
10. The silent voice interactive system based on multi-modal biological signal perception according to claim 9, wherein the personality interactive module is configured with personality interactive policies comprising: for a first user, carrying out model training on a general word recognition model of the gender corresponding to the first user by utilizing the personalized signal sample data to obtain a personalized word recognition model of the first user; When a first user performs silent voice interaction, a signal acquisition patch is used for acquiring and processing myoelectric signals and vibration signals generated by the first user through default, corresponding myoelectric standard signals and vibration standard signals are obtained, a corresponding individual word recognition model is input, words of the user through default are obtained, and interaction is completed according to the words of default.

Description

Silence voice interaction system based on multimode biological signal perception Technical Field The invention relates to the technical field of silent voice interaction, in particular to a silent voice interaction system based on multi-mode biological signal perception. Background The silent voice interaction technology is a novel intelligent interaction technology which is completely independent of audible acoustic voice signals, generates directly related physiological motion signals with the language in the state of capturing human defaults or sub-vocalization, decodes the physiological motion signals into texts, control instructions or synthesized voice through an algorithm, and finally realizes man-machine interaction without sounding and no acoustic wave leakage and interpersonal communication. When capturing words to be expressed in a user's default state, the existing silent voice interaction technology usually acquires physiological motion signals generated by default words of a plurality of testers in advance, then trains the physiological motion signals generated by the default words of the users to obtain a universal recognition model by using a machine learning model, and inputs the physiological motion signals generated by the default words of the users into the universal model to obtain words to be expressed by the users, however, the traditional voice is sound wave transmitted through air, a large number of common acoustic features are reserved after the pronunciation of different people are transmitted through air, individual difference is fine adjustment on a common basis, physiological signals of the silent voice have no standardized propagation process, the inter-individual difference is essential, the training target of the universal model is fit with average features of all training set testers, common features of main user groups with the highest occupied ratio can be covered preferentially, unique and small-scale pronunciation features can be filtered by the model, the universal model is used on the testers participating in training, but the recognition rate is very high, the user's generalization capability is poor, the recognition accuracy of direct use of users can be greatly reduced, even the lowest, the individual difference is not reached, and the difference between individual words can not be completely filtered according to the conventional voice interaction conditions, the conventional voice interaction conditions can not be completely, the conventional voice interaction conditions can be filtered, the speech signals can not be completely be filtered, and the speech signals can not be completely filtered according to the conditions, and the conventional voice interaction conditions can be completely, the conditions can be completely filtered, and the voice signals can be completely be filtered, and the conditions can be completely filtered, and the user has the same conditions can be completely and the user has the same conditions, and the user has the best quality, and the user has better recognition performance, and better quality, and constructing a personalized silent voice recognition model. Disclosure of Invention The invention aims to solve at least one of the technical problems in the prior art to a certain extent, and solves the problems that when the existing silent voice interaction technology captures the words to be expressed in the silent voice interaction state of a user, the similar sample signals cannot be screened according to the signal characteristics when the user is in the silent voice interaction state, and a personalized silent voice recognition model is constructed by setting test words, collecting the electromyographic signals and vibration signals of the user when the user is in the silent voice interaction state, and screening the testers similar to the user from the testers. In order to achieve the above purpose, the application provides a silent voice interaction system based on multi-mode biological signal perception, which comprises a sample acquisition module, a general construction module, a similarity analysis module and a personality interaction module; The sample acquisition module is used for establishing an interactive word database and acquiring electromyographic signals and vibration signals of a plurality of testers when the testers default interactive words to obtain word signal sample data of different sexes; the general construction module constructs a corresponding word recognition model according to the word signal sample data of different sexes to obtain general word recognition models of different sexes; The similarity analysis module comprises a signal acquisition unit and a similarity screening unit, wherein the signal acquisition unit is used for setting test words and acquiring myoelectric signals and vibration signals of a user when the user defaults the test words, and the similarity screening unit is used for screening