CN-122024729-A - Multi-robot voice instruction response method and system

CN122024729ACN 122024729 ACN122024729 ACN 122024729ACN-122024729-A

Abstract

The application discloses a method and a system for responding to voice instructions of multiple robots, wherein the method comprises the steps that a first robot responds to the voice instructions of a user, collects a first environment image and determines the position of the first user; the method comprises the steps of creating a first broadcast data packet according to a first user position, local state information of a first robot and a first environment image, receiving second broadcast data packets of other robots to obtain a plurality of broadcast data packets, obtaining a target robot number according to the plurality of broadcast data packets, a user voice command, a historical dialogue record of the first robot and the first user in a current dialogue period and a home robot judging model, and continuing to respond to service requirements indicated by the user voice command when the number is detected to be consistent. The method can effectively avoid the competition response conflict of the multiple robots, and improves the accuracy and the intellectualization of the voice instruction response of the multiple robots.

Inventors

DOU KAI
WANG TAO

Assignees

深圳创盈芯实业有限公司

Dates

Publication Date: 20260512
Application Date: 20260410

Claims (10)

1. The voice command response method of the multiple robots is characterized by being applied to a system-on-chip mainboard with training and pushing integrated capability of a first robot in a service system of the multiple robots, and comprises the following steps: The method comprises the steps of responding to a user voice instruction, collecting a first environment image, determining a first user position, creating a first broadcast data packet of the first robot according to the first user position, the local state information of the first robot and the first environment image, receiving second broadcast data packets of other robots, and obtaining a plurality of broadcast data packets including the first broadcast data packet; Obtaining a target robot number responding to the user voice command according to the plurality of broadcast data packets, the user voice command, the historical dialogue record of the first robot and the first user in the current dialogue period and a pre-trained home robot judgment model; And when the target robot number is detected to be consistent with the equipment number of the first robot, continuing to respond to the service requirement indicated by the user voice instruction.
2. The method of claim 1, wherein the deriving a target robot number responsive to the user voice command based on the plurality of broadcast data packets, the user voice command, a historical conversation record of the first robot with the first user during a current conversation period, and a pre-trained home robot decision model, comprises: Analyzing the second broadcast data packet to obtain a second environment image acquired by the other robots in response to the user voice instruction and the local state information of the other robots, wherein the local state information of the robots comprises a time stamp, a space position, an equipment number and an computing power resource occupancy rate of a current processing time window; performing fusion processing on the first environment image and the second environment image to obtain a multi-machine vision fusion image; determining a robot call detection result according to the user voice command and the historical dialogue records of the first robot and the first user in the current dialogue period, wherein the robot call detection result comprises a device number of a specific robot or a preset abnormal value, and the preset abnormal value represents that the specific robot is not in the current dialogue period; And inputting the robot call detection result, the multi-machine vision fusion image, the first user position and a plurality of pieces of local state information of a plurality of robots corresponding to the broadcast data packets one by one to the home robot judgment model to obtain the target robot number output by the home robot judgment model.
3. The method of claim 2, wherein the determining a robot call detection result from the user voice command, the first robot, and the first user's historical dialog records for a current dialog period comprises: determining voice text content according to the user voice command; identifying whether the voice text content contains a preset robot call feature word or not, wherein the robot call feature word comprises a robot equipment number, a robot exclusive name and a robot position identifier; If yes, generating a robot call detection result according to the equipment number of the specific robot pointed by the robot call feature word, and If not, inquiring and identifying whether the history dialogue record of the first user in the current dialogue period contains the robot call feature words; If yes, generating a robot call detection result according to the equipment number of the specific robot pointed by the robot call feature word, and If not, generating the robot call detection result according to the preset abnormal value.
4. A method according to claim 3, wherein the home robot decision model is adapted to perform the following operations: If the robot call detection result is detected to comprise the equipment number of the specific robot, determining the equipment number of the specific robot as the target robot number; if the robot call detection result is detected to comprise the preset abnormal value, determining a reference position range mapped by the multi-machine vision fusion image, wherein the reference position range represents a space area commonly observable by a plurality of robots in the current environment, and Determining a space occupancy state of the first user position relative to the reference position range, the space occupancy state being empty indicating that the first user is outside of the view ranges of the plurality of robots, the space occupancy state being non-empty indicating that the first user is within the view range of at least one robot, and And determining the number of the target robot according to the space occupation state, the local state information of the robots and the first user position.
5. The method of claim 4, wherein the determining the spatial occupancy state of the first user location relative to the reference location range comprises: determining a spatial boundary of the reference position range in a global coordinate system; judging whether the first user position falls into a space boundary of the reference position range; if the first user position is judged to fall into the space boundary of the reference position range, determining that the space occupying state is non-empty; and if the first user position is judged to exceed the space boundary of the reference position range, determining that the space occupation state is empty.
6. The method of claim 4, wherein the determining the target robot number from the space occupation state, the plurality of local state information for the plurality of robots, and the first user location comprises: If the space occupation state is judged to be empty, determining the number of the target robot according to the state information of a plurality of local machines of the robots and the first user position; If the space occupation state is judged to be not empty, determining an occupation area image of the first user in the multi-machine vision fusion image according to the first user position, and detecting whether a robot exists in a preset range corresponding to the face orientation of the first user according to the occupation area image; If the existence of the robot is detected, determining the device number of the detected robot as the target robot number, and And if the robot is detected to be absent, determining the number of the target robot according to the plurality of pieces of local state information of the plurality of robots and the first user position.
7. The method of claim 6, wherein the plurality of local state information for the plurality of robots includes a plurality of spatial locations and a plurality of computing power resource occupancy rates for the plurality of robots in a one-to-one correspondence, wherein the determining the target robot number based on the plurality of local state information for the plurality of robots and the first user location includes: determining relative distances between the plurality of spatial positions and the first user position respectively to obtain a plurality of relative distances corresponding to the plurality of robots one by one; Scoring the relative distance of each robot in the plurality of robots according to a first preset scoring rule to obtain a plurality of first scores, wherein the score of the first scores and the distance of the relative distance form a negative correlation; Scoring the computing power resource occupancy rate of each robot in the plurality of robots according to a second preset scoring rule to obtain a plurality of second scores, wherein the score of the second scores and the score of the computing power resource occupancy rate form a negative correlation; Respectively carrying out weighted summation on the first scores and the second scores corresponding to each robot according to a first preset weight and a second preset weight to obtain a plurality of comprehensive scores; And determining the equipment number of the robot with the highest comprehensive score in the plurality of robots as the target robot number.
8. The method of any of claims 1-7, wherein continuing to respond to the intended task demand of the user voice instruction comprises: Performing intention recognition on the user voice command according to a historical dialogue state data set corresponding to the first user to obtain a target intention, wherein the historical dialogue state data set comprises historical dialogue data, historical event states of user participation, state data of articles associated with the user and user portrait data representing the activity habit of the user; Generating a subtask sequence for realizing the target intention according to the historical dialog state data set; And executing the subtask sequence.
9. The method of claim 8, wherein the performing intent recognition on the user voice command according to the historical dialog state data set corresponding to the first user to obtain a target intent comprises: performing intention recognition on voice text content corresponding to the voice instruction of the user to obtain an initial intention; Voiceprint recognition is carried out on the user voice command, and a first voiceprint feature set is determined; Inquiring a prestored historical dialogue state database according to the first voiceprint feature set to obtain a historical dialogue state data set corresponding to the first user, wherein the historical dialogue state database is prestored with a plurality of voiceprint feature sets and a plurality of historical dialogue state data sets corresponding to a plurality of users one by one; And adjusting the initial intention according to the historical dialog state data set to obtain the target intention.
10. A multi-robot service system, characterized in that the system comprises a plurality of robots, including a first robot, a single robot comprising a system-on-chip motherboard with a training and pushing integrated capability, the system-on-chip motherboard comprising a main controller, a main and an extended calculation card in communication with the main controller, and a memory in communication with the main controller, wherein the system-on-chip motherboard with a training and pushing integrated capability of the first robot is adapted to perform the steps in the method according to any of claims 1-9.

Description

Multi-robot voice instruction response method and system Technical Field The application relates to the technical field of general control or regulation systems, or the technical field of voice processing, or the technical field of computer systems based on specific calculation models, in particular to a multi-robot voice instruction response method and a multi-robot voice instruction response system. Background At present, the intelligent robot with body adopts a hardware design form of configuring a board by one machine, an end-side model deployed on a main board is required to be deployed locally after cloud training, and model reasoning and hardware calculation adaptability are poor. The voice interaction scheme of the traditional intelligent robot with body generally adopts an independent sensing, local recognition and autonomous response working mode, each robot collects environmental voice signals through a sound pickup device of the robot, and autonomously executes response actions after completing voice recognition and intention analysis through a local or cloud, and the mode is based on a single-robot independent working scene design, so that the mode is difficult to adapt to complex service environments of coexistence of multiple robots and multi-user concurrent interaction. In an actual service scene of multiple natural people and multiple robots, the problem of multi-robot competitive response exists, a spoken voice instruction of a single user can be collected by multiple robots in an effective pickup range at the same time, each robot starts to execute actions after independently judging the instruction validity, execution conflict of the multiple robots responding to the same instruction at the same time is easy to occur, the requirement of multi-user and multi-task parallel collaborative service in the multi-robot service scene cannot be met, the resource waste of robot calculation force and the low service efficiency can be caused, even potential safety hazards such as collision and interference are caused due to simultaneous operation of the multiple robots, and the actual application scene with higher requirements on service accuracy and collaborative performance is difficult to adapt to the old people. Disclosure of Invention In view of the above, the embodiment of the application provides a method and a system for responding to voice instructions of multiple robots, which are applied to a training and pushing integrated system level chip mainboard of the robots in a multiple robot service system, by responding to voice instructions of the users, collecting environment images, determining the positions of the users, generating a robot broadcast data packet by combining with the state information of the robots, integrating the data packet information of the multiple robots interactively, accurately determining a unique instruction response target robot by combining a historical dialogue record and a pre-trained home robot judging model, realizing the unique judgment of an instruction response main body under the coexistence scene of the multiple robots, effectively avoiding the conflict problem of the multi-robot competitive state response, relying on a training and pushing integrated hardware architecture, enabling the end side of the robots to finish model reasoning and task processing, and adapting to the complex service environment of multiple robots and multi-user concurrent interaction. In a first aspect, an embodiment of the present application provides a method for responding to a voice command of a plurality of robots, which is applied to a system-on-chip motherboard with an integrated training and pushing capability of a first robot in a service system of the plurality of robots, where the method includes: The method comprises the steps of responding to a user voice instruction, collecting a first environment image, determining a first user position, creating a first broadcast data packet of the first robot according to the first user position, the local state information of the first robot and the first environment image, receiving second broadcast data packets of other robots, and obtaining a plurality of broadcast data packets including the first broadcast data packet; Obtaining a target robot number responding to the user voice command according to the plurality of broadcast data packets, the user voice command, the historical dialogue record of the first robot and the first user in the current dialogue period and a pre-trained home robot judgment model; And when the target robot number is detected to be consistent with the equipment number of the first robot, continuing to respond to the service requirement indicated by the user voice instruction. In a second aspect, the present application further provides a multi-robot service system, where the system includes a plurality of robots, where the plurality of robots includes a first robot, and a single robot includes a syste