EP-4086894-B1 - SEMANTIC RECOGNITION REJECTION METHOD, SEMANTIC RECOGNITION REJECTION APPARATUS, TRANSPORTATION MEANS, AND MEDIUM

EP4086894B1EP 4086894 B1EP4086894 B1EP 4086894B1EP-4086894-B1

Inventors

HAN, Chuanyu
YI, Hui
WENG, ZHIWEI

Dates

Publication Date: 20260506
Application Date: 20211026

Claims (8)

A semantic recognition rejection method, comprising: acquiring (01) text of a plurality of speech requests, and a plurality of phrase output confidence values corresponding to the text of the speech requests; generating (02) a plurality of confidence features based on the text of the speech requests and the phrase output confidence values corresponding to the text of the speech requests, wherein the text is generated within a predetermined time period, wherein each confidence feature comprises the text of the corresponding speech request and word segmentation confidence values corresponding to the text, wherein each word segmentation confidence value is a confidence value corresponding to each word in the text, and wherein each word segmentation confidence value is the same as the corresponding phrase output confidence value; merging (03) the plurality of confidence features corresponding to a context to generate a target confidence feature; and using (04) a trained semantic recognition rejection model to perform a prediction for the target confidence feature to obtain a recognition rejection result, wherein the semantic recognition rejection model is trained based on a preset multimodal model; characterised in that : said merging the plurality of confidence features corresponding to a context to generate a target confidence feature comprises: sequencing (031) corresponding confidence features based on device identifications of the speech requests and speech acquisition time of the speech requests, wherein device identification means identification of the device that acquired the corresponding speech request; and merging (032) adjacent confidence features to generate the target confidence feature, the adjacent confidence features having the same device identification and being included within a preset listening duration.
The semantic recognition rejection method according to claim 1, wherein said generating a plurality of confidence features based on the text and the phrase output confidence values corresponding to the text comprises: normalizing (022) the word segmentation confidence values to construct a confidence vocabulary; and generating (023) the confidence feature based on the text and the confidence vocabulary.
The semantic recognition rejection method according to claim 1, wherein said using (04) a trained semantic recognition rejection model to perform a prediction for the target confidence feature to obtain a recognition rejection result comprises: determining (041) a word segmentation feature vector, a sentence segmentation feature vector, a position feature vector and a confidence feature vector based on the target confidence feature; extracting (042) text encoding information based on the word segmentation feature vector, the sentence segmentation feature vector and the position feature vector; extracting (043) confidence encoding information based on the confidence feature vector; splicing (044) the text encoding information and the confidence encoding information to perform self-attention feature fusion; and processing (045) a result of the self-attention feature fusion using an activation function to obtain the recognition rejection result.
The semantic recognition rejection method according to claim 3, wherein: said extracting text encoding information based on the word segmentation feature vector, the sentence segmentation feature vector and the position feature vector comprises: performing feature extraction on the word segmentation feature vector, the sentence segmentation feature vector and the position feature vector using a bidirectional encoder representation from transformers Encoder, BERT-Encoder, model to obtain the text encoding information, wherein the BERT-Encoder model comprises a plurality of multi-head-attention layers, a dense layer and a layer normalization, layer_norm, layer; and said extracting confidence encoding information based on the confidence feature vector comprises: performing single-layer bidirectional long short-term memory, LSTM, feature extraction on the target confidence feature to obtain the confidence encoding information.
The semantic recognition rejection method according to claim 3, further comprising: acquiring a training text of a training speech request and a training phrase output confidence value corresponding to the training text; generating a training confidence feature based on the training text and the corresponding training phrase output confidence value, wherein the training confidence feature comprises the training text and training word segmentation confidence values corresponding to the training text; merging training confidence features corresponding to a context to generate a target training confidence feature; determining a training recognition result for the target training confidence feature; and training the preset multimodal model using the target training confidence feature and the training recognition result to obtain the trained semantic recognition rejection model.
A semantic recognition rejection apparatus (100), comprising: an acquisition module (110), configured to acquire text of a plurality of speech requests and a plurality of phrase output confidence values corresponding to the text of the speech requests; a generation module (120), configured to generate a plurality of confidence features based on the text of the speech requests and the phrase output confidence values corresponding to the text of the speech requests, wherein the text is generated within a predetermined time period, wherein each confidence feature comprises the text of the corresponding speech request and word segmentation confidence values corresponding to the text, wherein each word segmentation confidence value is a confidence value corresponding to each word in the text, and wherein each word segmentation confidence value is the same as the corresponding phrase output confidence value; a merging module (130), configured to merge the plurality of confidence features corresponding to a context to generate a target confidence feature; and a processing module (140), configured to use a trained semantic recognition rejection model to perform a prediction for the target confidence feature to obtain a recognition rejection result, wherein the semantic recognition rejection model is trained based on a preset multimodal model; characterised in that said merging the plurality of confidence features corresponding to a context to generate a target confidence feature comprises: sequencing corresponding confidence features based on device identifications of the speech requests and speech acquisition time of the speech requests, wherein device identification means identification of the device that acquired the corresponding speech request; and merging adjacent confidence features to generate the target confidence feature, the adjacent confidence features having the same device identification and being included within a preset listening duration.
A transportation device, comprising memory and a processor, wherein the memory comprises a computer program stored thereon which, when executed by the processor, implements the semantic recognition rejection method according to any one of claims 1 to 5.
A non-volatile computer-readable storage medium having a computer program stored thereon which, when executed by one or more processors, implements the semantic recognition rejection method according to any one of claims 1 to 5.

Description

Field of Invention The present application relates to the field of transportation, and in particular, to a semantic recognition rejection method, a semantic recognition rejection apparatus, a transportation means, and a computer-readable storage medium. Background With implementation of intelligence on transportation means, interaction between applications for transportation means and users becomes more and more frequent. At present, in a scene where an in-vehicle voice assistant keeps listening, because an actual interactive environment is complex and changeable, there is often input of noisy speeches in the process of speech interaction, which leads to a wrong response from the in-vehicle voice assistant. In related technologies, the in-vehicle voice assistant can refuse to recognize some speeches based on semantics of the input speeches through a semantic recognition rejection model, so as to increase a recognition rate of the in-vehicle voice assistant. Therefore, an error rate of the semantic recognition rejection model directly affects whether an instruction will be correctly understood and executed, and how to increase a speech recognition rejection rate has become an urgent problem to be solved. US 2020/320985 A1 teaches a method of enhancing an automated speech recognition confidence classifier. Summary of Invention In view of the above-mentioned problems, embodiments of the present disclosure provide a semantic recognition rejection method, a semantic recognition rejection apparatus, a transportation means, and a computer-readable storage medium. In accordance with the present invention, there are provided a semantic recognition rejection method as recited by claim 1, and a semantic recognition rejection apparatus as recited by claim 6. Preferred features are set out in the dependent claims. In the present disclosure, confidence features are generated based on text generated based on a speech request and a corresponding phrase output confidence value, and confidence features corresponding to a context are merged to generate a target confidence feature, thereby establishing confidence values and the context. After that, a trained semantic recognition rejection model is used to perform a predicting for the target confidence feature to obtain a recognition rejection result. In this manner, accuracy of semantic recognition rejection is improved. Additional aspects and advantages of the embodiments of the present disclosure will be given in the following description, and some parts will become apparent from the following description, or from the practice of the present disclosure. Brief Description of Drawings The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the description of the embodiments in combination with the accompanying drawings below, in which: Fig. 1 is a schematic flowchart of a semantic recognition rejection method according to the present disclosure;Fig. 2 is a schematic block diagram of a semantic recognition rejection apparatus according to the present disclosure;Fig. 3 is a schematic flowchart of a semantic recognition rejection method according to the present disclosure;Fig. 4 is a schematic flowchart of a semantic recognition rejection method according to the present disclosure;Fig. 5 is a schematic flowchart of a semantic recognition rejection method according to the present disclosure;Fig. 6 is a schematic scenario diagram of a semantic recognition rejection method according to the present disclosure; andFig. 7 is a schematic flowchart of a semantic recognition rejection method according to the present disclosure. Detailed Description The embodiments of the present disclosure are described in detail below, and examples of the embodiments are shown in the accompanying drawings. In the drawings, the same or similar reference numerals throughout the text indicate the same or similar elements or elements having the same or similar functions. The embodiments described below by reference to the accompanying drawings are exemplary and are merely intended to explain the embodiments of the present disclosure and are not to be construed as limiting the embodiments of the present disclosure. Referring to Fig. 1, the present disclosure provides a semantic recognition rejection method, including the following steps. In step 01, text of a speech request and a phrase output confidence value corresponding to the text are acquired. In step 02, a confidence feature is generated based on the text and the corresponding phrase output confidence value. The confidence feature includes the text and word segmentation confidence values corresponding to the text. In step 03, confidence features corresponding to a context are merged to generate a target confidence feature. In step 04, a trained semantic recognition rejection model is used to perform a prediction for the target confidence feature to obtain a recognition rejection resul