CN-122024718-A - Composite robot voice instruction recognition method and system

CN122024718ACN 122024718 ACN122024718 ACN 122024718ACN-122024718-A

Abstract

The invention discloses a voice command recognition method and a voice command recognition system for a compound robot, wherein the voice signals are obtained, the voice signals are preprocessed to obtain independent voice command signals corresponding to different voice sources, voice features of the independent voice command signals are extracted, command texts are generated, command semantics of the command texts are analyzed, the command semantics are matched with pre-stored action task priority information, priority information of action tasks corresponding to the command semantics is obtained, corresponding confidence levels are evaluated respectively, and a multi-source priority information fusion score is calculated in a dynamic confidence level weighting fusion mode. And selecting and executing an action task according to the multisource priority information fusion score. The method and the device solve the problem that not only understand the literal meaning of the instruction, but also sense the real urgency behind the instruction, so that the decision conforming to the actual situation can be quickly and accurately made under emergency conditions, and the operation safety and task continuity are ensured.

Inventors

FENG YUYAN
WEI CHANGZHOU
TANG XIA
ZHANG ZISHUAI
WANG QIWEI

Assignees

无锡职业技术大学

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. A method for recognizing a voice command of a compound robot, wherein the compound robot is configured to receive a plurality of voice signals and perform an action task corresponding to the voice signals, and comprises: Step S1, acquiring the voice signals, and preprocessing the voice signals to obtain independent voice instruction signals corresponding to different voice sources; S2, extracting voice characteristics of the independent voice command signals, and calibrating the voice characteristics by combining with environmental acoustic characteristics to obtain calibrated voice emotion characteristic information; Step S3, performing voice recognition according to the independent voice command signals to generate command texts, analyzing command semantics of the command texts, matching the command semantics with pre-stored priority information of action tasks, and obtaining the priority information of the action tasks corresponding to the command semantics; Step S4, based on the calibrated voice emotion feature information, instruction semantic information, action task priority information and environment safety perception information, respectively evaluating corresponding confidence degrees, and calculating multi-source priority information fusion scores in a dynamic confidence degree weighting fusion mode; And S5, saving the task state of the current action task and the physical state conforming to the robot, and selecting and executing the action task according to the multi-source priority information fusion score.
2. The method of claim 1, wherein the environmental acoustic features include interfering acoustic features such as high frequency noise or mechanical noise that affect speech emotion recognition accuracy, and wherein the dynamic calibration includes adjusting a judgment threshold or feature weight of the emotion features based on the intensity of the interfering acoustic features.
3. The method according to claim 1, wherein the evaluating the corresponding confidence levels in step S4 includes: And respectively evaluating the voice emotion feature confidence level, the instruction semantic confidence level, the action task priority confidence level and the environment perception result confidence level based on emotion priority information corresponding to the voice emotion feature, semantic priority information corresponding to the instruction semantic, action task priority information and safety early warning priority information triggered by the environment perception information.
4. The method of claim 3, wherein the confidence level is dynamically adjusted based on at least one of the following, including the confidence level of the speech emotion feature being adjusted based on different environmental interference intensities; The confidence of the command semantics is adjusted according to the accuracy of the command semantics identification and the definition of keyword matching; The action task priority confidence level is adjusted according to the stability of the action task priority source; and the confidence level of the environment sensing result is adjusted according to whether the accuracy of the physical security risk identification exceeds a set threshold value.
5. The method of claim 1, wherein the context aware information includes generating a self-triggering security pre-alarm when a potential physical security risk is predicted without receiving a voice command and engaging the self-triggering security pre-alarm as highest priority information in the multi-source priority information fusion.
6. The method of claim 1, wherein the task state save includes current task execution progress information, the physical state save includes a robot pose, an actuator state, and a carry-over state, and the task state and physical state are marked as recoverable breakpoint states.
7. The method according to any one of claims 1 to 6, wherein the step S5 further comprises: when the instruction information corresponding to the multi-source priority information fusion score is incomplete, entering a state to be clarified, actively detecting a potential risk area, and simultaneously sending clarification inquiry information; and continuously executing the interrupted action task from the recoverable power-off state according to the received clarification instruction or the autonomous sensing result.
8. A compound robotic voice command recognition system, the system comprising: The voice acquisition and preprocessing module is used for acquiring the voice signals and preprocessing the voice signals to obtain independent voice instruction signals corresponding to different voice sources; The emotion feature sensing and calibrating module is used for extracting the voice features of the independent voice command signals, and calibrating the voice features by combining the environmental acoustic features to obtain calibrated voice emotion feature information; The semantic analysis task mapping module is used for carrying out voice recognition according to the independent voice command signals to generate command texts, analyzing command semantics of the command texts, matching the command semantics with pre-stored action task priority information and obtaining the priority information of action tasks corresponding to the command semantics; The dynamic confidence weighting fusion module is used for respectively evaluating the corresponding confidence coefficient based on the calibrated voice emotion characteristic information, the command semantic information, the action task priority information and the environment safety perception information, and calculating a multisource priority information fusion score in a dynamic confidence weighting fusion mode; and the execution interaction module is used for storing the task state of the current action task and the physical state conforming to the robot, and selecting and executing the action task according to the multi-source priority information fusion score.
9. The system of claim 8, wherein the execution interaction module is further configured to enter a state to be clarified when the instruction information corresponding to the multi-source priority information fusion score is incomplete, actively detect a potential risk area, send clarification inquiry information, and continue to execute the interrupted action task from the recoverable breakpoint state according to the clarification instruction or the autonomous perception result.
10. A compound robot comprising the compound robot voice command recognition system of claim 8.

Description

Composite robot voice instruction recognition method and system Technical Field The application relates to the technical field of industrial control, in particular to a method and a system for recognizing a voice instruction of a composite robot. Background In modern intelligent manufacturing environments, compound robots are critical devices that accomplish a variety of tasks, such as material handling, component assembly, and equipment inspection. To achieve efficient and flexible work, these robots are often equipped with a voice command recognition system, allowing on-site operators to schedule tasks via verbal commands. However, the actual production plant environment is complex and variable, often accompanied by noise generated by equipment operation, accent differences among different operators, and possible semantic ambiguity of the instruction itself, which all present a significant challenge for accurate recognition of voice instructions. In particular, the system often encounters difficulty in identifying critical motion parameters contained in complex instructions, such as target location, effort of operation, or urgency of execution. When the traditional voice recognition method processes unstructured spoken language expression, the analysis capability is insufficient, so that misjudgment or execution delay of instructions is easily caused, and further the operation efficiency and the production safety of a robot are affected. Disclosure of Invention Aiming at the defects, the invention provides a voice command recognition method and a voice command recognition system for a compound robot, which aim to solve the problems that not only the literal meaning of a command is understood, but also the real urgency behind the command is perceived, so that a decision conforming to the actual situation can be quickly and accurately made under the emergency, and the operation safety and the task continuity are ensured. The first aspect of the present invention provides a method for identifying a voice command of a composite robot, where the composite robot is configured to receive a plurality of voice signals and execute an action task corresponding to the voice signals, and the method includes: Step S1, acquiring the voice signals, and preprocessing the voice signals to obtain independent voice instruction signals corresponding to different voice sources. And S2, extracting the voice characteristics of the independent voice command signals, and calibrating the voice characteristics by combining the environmental acoustic characteristics to obtain calibrated voice emotion characteristic information. And step S3, performing voice recognition according to the independent voice command signals to generate command texts, analyzing command semantics of the command texts, matching the command semantics with pre-stored priority information of the action tasks, and obtaining the priority information of the action tasks corresponding to the command semantics. And S4, based on the calibrated voice emotion characteristic information, instruction semantic information, action task priority information and environment safety perception information, respectively evaluating the corresponding confidence coefficient, and calculating a multisource priority information fusion score in a dynamic confidence coefficient weighting fusion mode. And S5, saving the task state of the current action task and the physical state conforming to the robot, and selecting and executing the action task according to the multi-source priority information fusion score. According to one embodiment of the invention, the environmental acoustic features include interference acoustic features such as high frequency noise or mechanical noise that affect speech emotion recognition accuracy, and the dynamic calibration includes adjusting a judgment threshold or feature weight of the emotion features according to the intensity of the interference acoustic features. According to one embodiment of the invention, the step S4 of evaluating the corresponding confidence level respectively comprises evaluating the confidence level of the voice emotion feature, the confidence level of the command semantic, the confidence level of the action task priority and the confidence level of the environment perception result respectively based on the emotion priority information corresponding to the voice emotion feature, the semantic priority information corresponding to the command semantic, the action task priority information and the safety precaution priority information triggered by the environment perception information. According to one embodiment of the invention, the confidence level is dynamically adjusted according to at least one of the following, including that the confidence level of the speech emotion features is adjusted according to different environmental interference intensities. The confidence of the command semantics is adjusted according to the acc