EP-4381475-B1 - DETECTION OF SILENT SPEECH

EP4381475B1EP 4381475 B1EP4381475 B1EP 4381475B1EP-4381475-B1

Inventors

MAIZELS, AVIAD
BARLIYA, Avi
KORNBLAU, GIORA
WEXLER, YONATAN

Dates

Publication Date: 20260513
Application Date: 20220516

Claims (14)

A sensing device (20, 60), comprising: a bracket (22) configured to fit an ear of a user (24) of the device; an optical sensing head (28) held by the bracket in a location in proximity to a face of the user and configured to sense light reflected from the face and to output a signal in response to the detected light; and processing circuitry (70, 75) configured to process the signal to generate a speech output; characterized in that the optical sensing head comprises an emitter configured to direct coherent light toward the face and an array of sensors configured to sense a secondary speckle pattern due to reflection of the coherent light from the face.
The device according to claim 1, wherein the bracket comprises an ear clip or a spectacle frame.
The device according to claim 1, wherein the optical sensing head is configured to sense the light reflected from a cheek of the user.
The device according to claim 1, wherein the emitter is configured to direct multiple beams of the coherent light toward different, respective locations on the face, and the array of sensors is configured to sense the secondary speckle pattern reflected from the locations, wherein the locations illuminated by the beams and sensed by the array of sensors extend over a field of view having an angular width of at least 60° and an area of at least 1 cm 2 .
The device according to claim 1, wherein the processing circuitry is configured to detect changes in the sensed secondary speckle pattern and to generate the speech output responsively to the detected changes.
The device according to claim 1, wherein the processing circuitry is configured to operate the array of sensors at a first frame rate, to sense, responsively to the signal while operating at the first frame rate, a movement of the face, and to increase the frame rate responsively to the sensed movement to a second frame rate, greater than the first frame rate, for generating the speech output.
The device according to claims 1-6, wherein the processing circuitry is configured to generate the speech output responsively to changes in the signal output by the optical sensing head due to movements of a skin surface of the user without any utterance of sounds by the user.
The device according to claims 1-6, wherein the optical sensing head is held by the bracket in a position that is at least 5 mm away from a skin surface of the user.
The device according to claims 1-6, and comprising one or more electrodes configured to contact a skin surface of the user, wherein the processing circuitry is configured to generate the speech output responsively to the electrical activity sensed by the one or more electrodes together with the signal output by the optical sensing head.
The device according to claims 1-6, and comprising a microphone configured to sense sounds uttered by the user.
The device according to claims 1-6, and comprising a wireless communication interface, wherein the processing circuitry is configured to encode the signal for transmission over the communication interface to a processing device, which processes the encoded signals to generate the speech output.
The device according to claims 1-6, and comprising a user control, which is connected to the bracket and configured to sense a gesture made by the user, wherein the processing circuitry is configured to change an operational state of the device responsively to the sensed gesture.
The device according to claims 1-6, and comprising a speaker configured to fit in the ear of the user, wherein the processing circuitry is configured to synthesize an audio signal corresponding to the speech output for playback by the speaker.
A method for sensing, comprising: sensing a movement of skin on a face of a human subject in response to words articulated by the subject without vocalization of the words by the subject and without contacting the skin; and responsively to the sensed movement, generating a speech output including the articulated words; characterized in that sensing the movement comprises directing coherent light toward the skin and sensing a secondary speckle pattern due to reflection of the coherent light from the skin.

Description

The present invention relates generally to physiological sensing, and particularly to methods and apparatus for sensing human speech. The process of speech activates nerves and muscles in the chest, neck, and face. Thus, for example, electromyography (EMG) has been used to capture muscle impulses for purposes of speech sensing. Secondary speckle patterns have been used for monitoring movement of skin on the human body. Secondary speckle typically occurs in diffuse reflections of a laser beam from a rough surface, such as the skin. By tracking both temporal and amplitude changes of secondary speckle produced by reflection from human skin when illuminated by a laser beam, investigators have measured blood pulse pressure and other vital signs. For example, U.S. Patent 10,398,314 describes a method for monitoring conditions of a subject's body using image data that is indicative of a sequence of speckle patterns generated by the body. US2020370879 discloses wearable devices, their configurations, and methods of operation that use self-mixing interferometry signals of a self-mixing interferometry sensor to recognize user inputs. WO2019050881 discloses a system that is able to detect silent, internal articulation of words by a human user, by measuring low-voltage electrical signals at electrodes positioned on a user's skin. Embodiments of the present invention that are described hereinbelow provide novel methods and devices for sensing human speech. In accordance with an aspect of the present invention, there is provided a sensing device as set out in the first of the appended independent claims. In accordance with another aspect of the present invention, there is provided a method for sensing as set out in the second of the appended independent claims. Features of various embodiments are set out in the appended dependent claims. There is also disclosed herein examples of a sensing device, including a bracket configured to fit an ear of a user of the device and an optical sensing head held by the bracket in a location in proximity to a face of the user and configured to sense light reflected from the face and to output a signal in response to the detected light. Processing circuitry is configured to process the signal to generate a speech output. In one example, the bracket includes an ear clip. Alternatively, the bracket includes a spectacle frame. In a disclosed example, the optical sensing head is configured to sense the light reflected from a cheek of the user. The optical sensing head includes an emitter configured to direct coherent light toward the face and an array of sensors configured to sense a secondary speckle pattern due to reflection of the coherent light from the face. The emitter is configured to direct multiple beams of the coherent light toward different, respective locations on the face, and the array of sensors is configured to sense the secondary speckle pattern reflected from the locations. Additionally or alternatively, the locations illuminated by the beams and sensed by the array of sensors extend over an area of at least 1 cm2. Further additionally or alternatively, the optical sensing head includes multiple emitters, which are configured to generate respective groups of the beams covering different, respective areas of the face, and the processing circuitry is configured to select and actuate a subset of the emitters without actuating all the emitters. In a disclosed example, the processing circuitry is configured to detect changes in the sensed secondary speckle pattern and to generate the speech output responsively to the detected changes. Alternatively or additionally, the processing circuitry is configured to operate the array of sensors at a first frame rate, to sense, responsively to the signal while operating at the first frame rate, a movement of the face, and to increase the frame rate responsively to the sensed movement to a second frame rate, greater than the first frame rate, for generating the speech output. In a disclosed example, the processing circuitry is configured to generate the speech output responsively to changes in the signal output by the optical sensing head due to movements of a skin surface of the user without any utterance of sounds by the user. Typically, the optical sensing head is held by the bracket in a position that is at least 5 mm away from a skin surface of the user. In one example, the device includes one or more electrodes configured to contact a skin surface of the user, wherein the processing circuitry is configured to generate the speech output responsively to the electrical activity sensed by the one or more electrodes together with the signal output by the optical sensing head. Additionally or alternatively, the device includes a microphone configured to sense sounds uttered by the user. In one example, the processing circuitry is configured to compare the signal output by the optical sensing head to the sounds sensed by the microphone in or