US-12626699-B2 - Voice recognition device, voice recognition method, and non-transitory computer readable recording medium

US12626699B2US 12626699 B2US12626699 B2US 12626699B2US-12626699-B2

Abstract

A voice recognition device includes an estimation unit that compares a plurality of pieces of registration voice data stored in a database with input voice data uttered by a speaker who gets on a mobile body to estimate a registration command corresponding to the input command, a presentation unit that presents an estimation result, a second acquisition unit that acquires an error instruction indicating that the estimation result is an error, a determination unit that, in a case where the error instruction is acquired, determines a correct command corresponding to the input command based on an operation by the speaker, and a database management unit that stores the correct command and the input voice data in the database in association with each other.

Inventors

Takahiro Kamai
Katsunori DAIMO
Misaki DOI
Kousuke ITAKURA

Assignees

PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA

Dates

Publication Date: 20260512
Application Date: 20231204
Priority Date: 20210607

Claims (13)

1 . A voice recognition device that performs voice recognition on a command of a mobile body, the device comprising: a database storing a plurality of pieces of registration voice data of a plurality of registration commands uttered by a speaker in advance; and a processor that performs operations including: acquiring input voice data of an input command uttered by the speaker who gets on the mobile body; comparing the plurality of pieces of registration voice data with the input voice data to estimate a registration command corresponding to the input command; presenting an estimation result; acquiring an error instruction indicating that the estimation result is an error; determining, in a case where the error instruction is acquired, a correct command corresponding to the input command based on an operation by the speaker; controlling the mobile body based on the correct command; storing, in the database, the correct command and the input voice data in the database in association with each other, to update the database; and repeating at least the acquiring input voice data, and the comparing using the updated database, wherein, in the determining, the processor estimates a correct command based on a monitoring result of an operation on the mobile body after input of the error instruction, and presents, in a case where stopping of the mobile body is detected, the estimated correct command, acquires a check instruction for the estimated correct command, and determines the correct command based on the check instruction.
2 . The voice recognition device according to claim 1 , wherein, in the determining, the processor presents a plurality of correct candidate commands, acquires a selection instruction for selecting one correct candidate command from among the plurality of correct candidate commands, and determines the one correct candidate command as the correct command.
3 . The voice recognition device according to claim 2 , wherein the processor presents the plurality of registration commands sorted in descending order of similarity between the input voice data and the plurality of pieces registration voice data, as the plurality of correct candidate commands.
4 . The voice recognition device according to claim 1 , wherein the processor monitors, after input of the error instruction, an operation input to the mobile body and determines the correct command based on a monitoring result.
5 . The voice recognition device according to claim 1 , wherein, in the determining, the processor holds, in a memory, the input voice data corresponding to the input command to which the error instruction is input, and presents, in the case where the stopping of the mobile body is detected, the estimated correct command in conjunction with reproduction of the input voice data held in the memory.
6 . The voice recognition device according to claim 5 , wherein the processor presents, in a case where the check instruction indicating that the estimated correct command is an error is acquired, the plurality of registration commands as correct candidate commands, acquires a selection instruction for selecting one correct candidate command from among the plurality of registration commands, and determines the one correct candidate command as the correct command.
7 . The voice recognition device according to claim 6 , wherein the processor presents the plurality of registration commands sorted in descending order of similarity between the plurality of pieces of registration voice data and the input voice data, as the correct candidate command.
8 . The voice recognition device according to claim 1 , wherein the processor compares a feature amount between the plurality of pieces of registration voice data and the input voice data to estimate the registration command corresponding to the input command.
9 . The voice recognition device according to claim 1 , wherein the processor compares a feature amount between the input voice data and voices of a plurality of registration speakers to identify a registration speaker corresponding to the speaker, and compares registration voice data of the plurality of registration commands with the input voice data for the identified registration speaker to estimate the registration command corresponding to the input command.
10 . The voice recognition device according to claim 1 , wherein the processor presents a message for prompting input of the error instruction only in a case where the estimation result is an error.
11 . The voice recognition device according to claim 10 , wherein the processor determines the correct command based on the operation by the speaker in a case where the error instruction is acquired within a predetermined timeout period, and determines that the estimation result is correct in a case where the error instruction is not acquired within the predetermined timeout period.
12 . A voice recognition method in a voice recognition device that performs voice recognition on a command of a mobile body, the voice recognition method comprising: acquiring input voice data of an input command uttered by a speaker who gets on the mobile body; acquiring a plurality of registration commands uttered by the speaker in advance from a database; comparing the plurality of registration commands with the input voice data to estimate a registration command corresponding to the input command; presenting an estimation result; acquiring an error instruction indicating that the estimation result is an error; determining, in a case where the error instruction is acquired, a correct command corresponding to the input command based on an operation by the speaker; controlling the mobile body based on the correct command; storing, in the database, the correct command and the input voice data in association with each other, to update the database; and repeating at least the acquiring input voice data, the acquiring a plurality of registration commands from the updated database and the comparing using the updated database, wherein the determining includes: estimating a correct command based on a monitoring result of an operation on the mobile body after input of the error instruction, and presenting, in a case where stopping of the mobile body is detected, the estimated correct command, acquiring a check instruction for the estimated correct command, and determining the correct command based on the check instruction.
13 . A non-transitory computer readable recording medium storing a voice recognition program for causing a computer to function as a voice recognition device that performs voice recognition on a command of a mobile body, the program for causing the computer to perform: acquiring input voice data of an input command uttered by a speaker who gets on the mobile body; acquiring registration voice data of a plurality of registration commands uttered by the speaker in advance from a database; comparing each piece of the registration voice data with the input voice data to estimate a registration command corresponding to the input command; presenting an estimation result; acquiring an error instruction indicating that the estimation result is an error; determining, in a case where the error instruction is acquired, a correct command corresponding to the input command based on an operation by the speaker; controlling the mobile body based on the correct command; storing, in the database, the correct command and the input voice data in association with each other, to update the database; and repeating at least the acquiring input voice data, the acquiring a plurality of registration commands from the updated database and the comparing using the updated database, wherein the determining includes: estimating a correct command based on a monitoring result of an operation on the mobile body after input of the error instruction, and presenting, in a case where stopping of the mobile body is detected, the estimated correct command, acquiring a check instruction for the estimated correct command, and determining the correct command based on the check instruction.

Description

TECHNICAL FIELD The present disclosure relates to a technique for performing voice recognition on a command of a mobile body. BACKGROUND ART Patent Literature 1 discloses a voice recognition device that, in a case where a voice input by a user of a vehicle cannot be recognized, stores the voice as an unrecognized word in association with a travelling situation of the vehicle, selects a plurality of synonyms for the unrecognized word from a voice recognition dictionary based on the travelling situation of the vehicle, presents the plurality of selected synonyms to the user, and stores a synonym selected by the user from among the plurality of presented synonyms in association with the unrecognized word. However, in Patent Literature 1, it is not considered that correct voice recognition cannot be performed due to a difference in noise sound between registration of a registration command and utterance of an input command, and thus a further improvement is required. CITATION LIST Patent Literature Patent Literature 1: JP 2004-233542 A SUMMARY OF INVENTION The present disclosure has been made to solve such a problem, and an object of the present disclosure is to provide a technique capable of correctly specifying a correct command for an input command even in a case where a noise sound is different between registration of a registration command and utterance of an input command. A voice recognition device according to one aspect of the present disclosure is a voice recognition device that performs voice recognition on a command of a mobile body, the voice recognition device including a first acquisition unit that acquires input voice data of an input command uttered by a speaker who gets on the mobile body, a database storing registration voice data of a plurality of registration commands uttered by the speaker in advance, an estimation unit that compares the plurality of pieces of registration voice data with the input voice data to estimate a registration command corresponding to the input command, a presentation unit that presents an estimation result, a second acquisition unit that acquires an error instruction indicating that the estimation result is an error, a determination unit that, in a case where the error instruction is acquired, determines a correct command corresponding to the input command based on an operation by the speaker, and a database management unit that stores the correct command and the input voice data in the database in association with each other. According to the present disclosure, even in a case where a noise sound is different between registration of a registration command and utterance of an input command, a correct command for an input command can be correctly specified. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram illustrating an example of a configuration of a voice recognition device according to a first embodiment of the present disclosure. FIG. 2 is a diagram illustrating an example of a data structure of a database. FIG. 3 is a flowchart illustrating an example of processing of the voice recognition device according to the first embodiment. FIG. 4 is a diagram illustrating an example of a scene where an input command is uttered. FIG. 5 is a diagram illustrating an example of a check screen indicating a check message. FIG. 6 is a diagram illustrating an example of a scene where an error button is operated. FIG. 7 is a diagram illustrating an example of a list screen of correct candidate commands. FIG. 8 is a diagram illustrating an example of a scene where the correct command is selected. FIG. 9 is a diagram illustrating an example of a completion screen indicating a completion message. FIG. 10 is a block diagram illustrating an example of a configuration of a voice recognition device according to a second embodiment. FIG. 11 is a flowchart illustrating an example of processing of the voice recognition device according to the second embodiment. FIG. 12 is a diagram illustrating a cancel screen presenting a cancel message. FIG. 13 is a diagram illustrating an example of a monitoring scene. FIG. 14 is a block diagram illustrating an example of a configuration of a voice recognition device according to a third embodiment. FIG. 15 is a flowchart illustrating an example of processing of the voice recognition device 1B according to the third embodiment. FIG. 16 is a flowchart following FIG. 15. FIG. 17 is a diagram illustrating an example of a presentation screen of an estimation correct command. FIG. 18 is a diagram illustrating an example of a list screen of correct candidate commands. FIG. 19 is a diagram illustrating a presentation screen according to a modification. FIG. 20 is a flowchart in the modification of the first embodiment. FIG. 21 is a diagram illustrating an example of a check screen indicating a check message in the modification of the first embodiment. FIG. 22 is a flowchart in a modification of the second embodiment. FIG. 23 is a flowchart in a modi