CN-122029600-A - Internal speech production performance measurement and quantification

CN122029600ACN 122029600 ACN122029600 ACN 122029600ACN-122029600-A

Abstract

Methods and systems for measuring and quantifying internal speech production are disclosed. Methods and systems present instructions to a user to generate internal speech and collect Electromyography (EMG) data corresponding to the internal speech. Methods and systems process EMG data through a machine learning model to generate predictions that include words or phrases corresponding to internal speech, and concurrently generate an assessment of internal speech generation and detection.

Inventors

Elizabeth. Izhaksson
Mel Meishulam
Hajar Yaming
Asif Zifu

Assignees

斯纳普公司

Dates

Publication Date: 20260512
Application Date: 20241004
Priority Date: 20231016

Claims (20)

1.A method, comprising: Presenting instructions to the user to generate internal speech; Collecting Electromyography (EMG) data corresponding to the internal speech; Classifying the EMG data by a machine learning model to generate a prediction comprising words or phrases corresponding to the internal speech, and While generating an assessment of internal speech production and detection.
2. The method of claim 1 wherein the assessment represents an ability of the user to generate the internal speech that matches the instruction and an accuracy of the machine learning model to correctly predict the words or phrases corresponding to the internal speech.
3. The method of any of claims 1-2, wherein the evaluating quantifies machine learning model performance and user performance simultaneously, and measures a degree of convergence between predictions made by the machine learning model and content produced by the user's society as the internal speech.
4. A method according to any one of claims 1 to 3, further comprising: the target words or phrases in the set of identical target specific phonemes, phoneme sounds, words or phrases are presented as the instruction.
5. The method of claim 4, further comprising: A determination is made as to whether the prediction generated by the machine learning model matches the target word or phrase.
6. The method of any one of claims 1 to 5, further comprising: presenting the same set of target-specific phonemes, phonemic sounds, words or phrases in a first region of a graphical user interface, and A first subset of the set of identical target-specific phones, phone sounds, words, or phrases is visually distinguished from a second subset of the set of identical target-specific phones, phone sounds, words, or phrases based on whether the set of identical target-specific phones, phone sounds, words, or phrases are properly classified by the machine learning model.
7. The method of claim 6, wherein a first subset of the set of identical target-specific phones, phone sounds, words, or phrases is presented in a first font type or color, and wherein a second subset of the set of identical target-specific phones, phone sounds, words, or phrases is presented in a second font type or color.
8. The method of any of claims 1-7, wherein the first font type or color indicates a correctly classified target specific phone, phone sound, word or phrase, and wherein the second font type or color indicates a misclassified target specific phone, phone sound, word or phrase.
9. The method of any one of claims 1 to 8, further comprising: Along with a first region of the graphical user interface, a quantification of machine learning model predictions of the set of identical target-specific phonemes, phonemic sounds, words or phrases is presented in a second region of the graphical user interface.
10. The method of claim 9, further comprising: A matrix is generated comprising a plurality of columns representing different predictions made by the machine learning model and a plurality of rows of the set of identical target specific phonemes, phoneme sounds, words or phrases as truth information.
11. The method of any of claims 1-10, wherein a first row of the matrix corresponds to a first truth target word or phrase, wherein a first column of the plurality of columns corresponds to a first target word prediction, and wherein an intersection between the first row and the first column includes a first indication of whether the first target word prediction matches the first truth target word or phrase.
12. The method of claim 11, wherein a second column of the plurality of columns corresponds to a second target word prediction, and wherein an intersection between the first row and the second column includes a second indication of whether the second target word prediction matches the first truth target word or phrase.
13. The method of any of claims 1-12, wherein the first indication represents a number of times the machine learning model classifies a set of EMG data as the first target word prediction when the user is instructed to produce internal speech corresponding to the first truth target word or phrase.
14. The method of any one of claims 1 to 13, further comprising: A total value is generated as part of the matrix, the total value representing a total number of times a particular word or phrase was predicted by the machine learning model when the user was instructed to produce internal speech corresponding to the same set of target particular phonemes, phonemic sounds, words or phrases of truth information.
15. The method of any one of claims 1 to 14, further comprising: Establishing a secure connection between a mobile device and an EMG device, the mobile device executing an interactive application, and The EMG data is received by the mobile device from the EMG device over the secure connection, wherein the mobile device detects the presence of the internal speech.
16. The method of any one of claims 1 to 15, further comprising: It is determined whether the evaluation meets a criterion.
17. The method of claim 16, further comprising: In response to determining that the evaluation fails to meet the criteria, performing at least one of updating one or more parameters of the machine learning model or presenting additional instructions to the user that produce internal speech, and Additional evaluations of internal speech generation and detection are concurrently generated in response to at least one of performing an update of one or more parameters of the machine learning model or presenting additional instructions to the user to generate internal speech.
18. The method of any one of claims 1 to 17, further comprising: responsive to determining that the evaluation meets the criteria, a training operation is terminated and a real-time prediction of internal speech is generated using the machine learning model to control operation of the user system.
19. A system, comprising: at least one processor, and At least one memory component having instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to perform operations comprising: Presenting instructions to the user to generate internal speech; Collecting Electromyography (EMG) data corresponding to the internal speech; Classifying the EMG data by a machine learning model to generate a prediction comprising words or phrases corresponding to the internal speech, and While generating an assessment of internal speech production and detection.
20. A non-transitory computer-readable storage medium having instructions stored thereon, which when executed by at least one processor, cause the at least one processor to perform operations comprising: Presenting instructions to the user to generate internal speech; Collecting Electromyography (EMG) data corresponding to the internal speech; Classifying the EMG data by a machine learning model to generate a prediction comprising words or phrases corresponding to the internal speech, and While generating an assessment of internal speech production and detection.

Description

Internal speech production performance measurement and quantification Priority statement The present application claims priority from U.S. patent application Ser. No. 18/487,683, filed on 10/16 of 2023, which is incorporated herein by reference in its entirety. Technical Field The present disclosure relates to Electromyography (EMG) speech systems and interactive applications and/or augmented reality (XR) devices, such as Augmented Reality (AR) and/or Virtual Reality (VR) devices. Background Some electronic enabled devices include various input interfaces that enable a user to communicate with other users. Such input interfaces include a voice message interface that enables a user to send verbal messages to other people. Other input interfaces include text input by the user to enter their desired message. These types of input interfaces require user movements, such as moving facial muscles to produce speech of a verbal message or moving fingers to select different keys on a keyboard. Drawings In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. To facilitate the identification of a discussion of any particular element or act, one or more of the highest digits in a reference numeral refer to the figure number in which that element was first introduced. Some non-limiting examples are shown in the figures of the accompanying drawings, in which: FIG. 1 is a diagrammatic representation of a networking environment in which the present disclosure may be deployed, according to some examples. Fig. 2 is a diagrammatic representation of a messaging system having both client-side and server-side functions in accordance with some examples. FIG. 3 is a diagrammatic representation of a data structure maintained in a database in accordance with some examples. Fig. 4 is a diagrammatic representation of a message according to some examples. Fig. 5 is a diagrammatic representation of a user wearing an EMG communication device according to some examples. Fig. 6 is a diagrammatic representation of an EMG speech detection system according to some examples. Fig. 7 is an illustrative output of an EMG speech detection system according to some examples. Fig. 8 is a flow chart illustrating example operations of an EMG speech detection system according to some examples. FIG. 9 is a diagrammatic representation of machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed according to some examples. Fig. 10 is a block diagram illustrating a software architecture in which an example may be implemented. Fig. 11 illustrates a system in which a headset may be implemented, according to some examples. Detailed Description The following description includes systems, methods, techniques, sequences of instructions, and computer-machine program products embodying illustrative examples of the present disclosure. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of the various examples. It will be apparent, however, to one skilled in the art that the examples may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques have not necessarily been shown in detail. Some conventional non-invasive brain-computer interfaces (BCIs) use electroencephalogram (EEG) sensors. Such systems detect neural signals in the brain of a user and decode the neural signals into various operations. These systems can be cumbersome to deploy and difficult to accurately place on the user's head. Other non-invasive computer interfaces utilize Electromyography (EMG) electrodes that detect electrical signals associated with muscle activity. Such systems rely on the measurement of muscle activity (as captured by the EMG signal). Of particular interest to BCI is the use of surface EMG to distinguish and identify virtually inaudible speech signals that are produced with relatively little or no acoustic input. Speech-related EMG signals can be measured at various locations on the face and neck, including on the sides of the throat, near the throat, and under the chin of a subject. Speaking is an actin event associated with apparent muscle movement, but thinking speech is not an actin event. Internal speech or imaginative speech refers to spontaneous behavior that silently speaks something, such as vividly imagining speaking, while the tongue, mouth and/or facial muscles do not or do little to move and are not intended to be understood by another person. Specifically, when a person aims to speak a word or phrase, the brain of the person generates a neural signal and supplies the neural signal to the corresponding speech-producing muscles, such as the throat, tongue, etc. Subthreshold muscle activation, also known as subthreshold muscle