CN-122008217-A - Operation control method and system based on natural language and electronic equipment

CN122008217ACN 122008217 ACN122008217 ACN 122008217ACN-122008217-A

Abstract

The application provides an operation control method, a system and electronic equipment based on natural language, wherein the method comprises the steps of identifying a target object of a voice instruction expressed in a natural language form and attribute information thereof; the method comprises the steps of detecting image data of a workstation based on a target object and attribute information thereof, determining an operation object matched with the target object in the workstation and space pose information thereof, identifying an operation intention corresponding to the operation object in a voice instruction, generating an operation instruction sequence of a mechanical arm according to the operation intention and the space pose information of the operation object, and driving the mechanical arm to execute operation based on the operation instruction sequence. According to the technical scheme, a user can command the robot to complete complex tasks only by inputting natural language instructions, so that the operation technical threshold of the robot can be reduced, and the adaptability and generalization capability of the robot in dynamic and unstructured industrial scenes can be enhanced.

Inventors

DING ZHE
TANG ANHUA
Huang Zuohao

Assignees

西门子工业自动化产品(成都)有限公司

Dates

Publication Date: 20260512
Application Date: 20260305

Claims (19)

1. A natural language based operation control method applied to a workstation configured with a robotic arm, the method (100) comprising: Identifying a target object of a voice instruction and attribute information (102) of the target object, wherein the voice instruction is expressed in a natural language form; Detecting image data of the workstation based on the target object and attribute information of the target object, and determining an operation object matched with the target object in the workstation and space pose information (104) of the operation object; Identifying an operation intention corresponding to the operation object in the voice instruction, and generating an operation instruction sequence (106) of the mechanical arm according to the operation intention and the space pose information of the operation object; and driving the mechanical arm to execute an operation (108) on the operation object based on the operation instruction sequence.
2. The operation control method according to claim 1, wherein the identifying the target object of the voice instruction and attribute information of the target object includes: acquiring a voice instruction and converting the voice instruction into a text instruction; extracting and splicing keywords of the text instruction to obtain prompt word information of the text instruction; Executing prediction on the prompt word information of the text instruction to obtain the target object and attribute information of the target object; the attribute information at least comprises one of appearance attribute information, space attribute information and semantic attribute information.
3. The operation control method according to claim 1, wherein the workstation is further configured with an imaging device; The detecting the image data of the workstation based on the target object and the attribute information of the target object, and determining the operation object matched with the target object in the workstation and the space pose information of the operation object, includes: Controlling the imaging device to acquire image data of the workstation in response to acquiring the target object and attribute information of the target object, wherein the image data comprises depth information; Performing a visual detection process on the image data according to the target object and attribute information of the target object, and determining an operation object matched with the target object from at least one candidate object of the image data; And calculating the space pose information of the operation object according to the depth information of the operation object and the image data.
4. The operation control method according to claim 1, wherein the recognizing the operation intention corresponding to the operation object in the voice instruction, and generating the operation instruction sequence of the robot arm based on the operation intention and the spatial pose information of the operation object, comprises: identifying an operation intention corresponding to the operation object in the voice instruction; Determining a moving path of the operation object in the workstation based on the operation intention and the space pose information of the operation object; Determining a plurality of moving track points of the mechanical arm and moving sequences of the plurality of moving track points based on the moving path and a preset moving rule of the mechanical arm; and executing instruction editing based on the plurality of moving track points and the moving sequence of the plurality of moving track points to generate the operation instruction sequence containing a plurality of operation instructions.
5. The method of claim 1, wherein the driving the robotic arm to perform an operation on the operation object based on the sequence of operation instructions comprises: Acquiring an operation instruction from the operation instruction sequence in sequence; driving the mechanical arm to execute operation on the operation object based on the operation instruction; Acquiring operation feedback data of the mechanical arm to detect the execution progress of the operation instruction; And returning to the step of acquiring the operation feedback data of the mechanical arm to detect the execution progress of the operation instructions if the operation instructions are detected to be executed, and returning to the step of acquiring one operation instruction from the operation instruction sequence in sequence until all the operation instructions in the operation instruction sequence are executed.
6. A natural language based operation control system (200) communicatively connecting a robotic arm (22) and a predictive model (24), the robotic arm (22) configured in a workstation (20), the system comprising: an instruction recognition module (210) connected to the prediction model (24), the instruction recognition module (210) being configured to obtain a voice instruction, invoke the prediction model (24) to recognize a target object in the voice instruction and attribute information of the target object, the voice instruction being expressed in a natural language form; A vision processing module (220) connecting the instruction recognition module (210) and the prediction model (24), the vision processing module (220) being configured to invoke the prediction model (24) to detect image data of the workstation (20) according to the target object and attribute information of the target object, and to determine an operation object matching the target object in the workstation (20) and space pose information of the operation object; An instruction generation module (230) connecting the instruction recognition module (210), the vision processing module (220) and the prediction model (24), the instruction generation module (230) being configured to invoke the prediction model (24) to predict a movement path of the manipulator (22) according to the voice instruction and the spatial pose information of the operation object, and to generate an operation instruction sequence of the manipulator (22) based on the movement path; An instruction execution module (240) connecting the instruction generation module (230) and the robot arm (22), the instruction execution module (240) being configured to drive the robot arm (22) to perform an operation on the operation object based on the operation instruction sequence.
7. The system of claim 6, wherein the predictive model (24) is deployed locally and/or at the cloud of the workstation (20).
8. The system of claim 6, wherein the instruction identification module (210) includes: -an interaction unit (212) configured to obtain the voice instruction; -a conversion unit (214) connected to the interaction unit (212), the conversion unit (214) being configured to convert the speech instructions into text instructions; The recognition unit (216) is connected with the conversion unit (214) and the prediction model (24), the recognition unit (216) is configured to call a first language prediction module in the prediction model (24) to extract and splice keywords of the text instruction, obtain prompt word information of the text instruction, call a second language prediction module in the prediction model (24) to execute prediction based on the prompt word information of the text instruction, and obtain attribute information of the target object and the target object; the attribute information at least comprises one of appearance attribute information, space attribute information and semantic attribute information.
9. The system of claim 8, wherein the conversion unit (214) further comprises: Outputting the text instruction to the interaction unit (212); Updating the text instruction based on the modification information in response to instruction modification information returned by the interaction unit (212) and re-outputting the updated text instruction to the interaction unit (212), or The text instruction is sent to the recognition unit (216) in response to instruction confirmation information returned by the interaction unit (212).
10. The system of claim 8, wherein the first language prediction module and the second language prediction module are the same large language model or different large language models based on API calls.
11. The system according to claim 6, wherein the workstation (20) is further provided with an imaging device (26); The vision processing module (220) is further connected to the imaging device (26) and is configured to: Controlling the imaging device (26) to acquire image data of the workstation (20) in response to the target object and attribute information of the target object acquired from the instruction recognition module (210), the image data including depth information; Invoking the predictive model (24), performing a visual inspection process on image data of the workstation (20) based on the target object and attribute information of the target object, determining an operation object matching the target object from at least one candidate object of the image data; And calculating the space pose information of the operation object according to the depth information of the operation object and the image data.
12. The system as recited in claim 11, wherein the imaging device (26) includes any one of a structured light camera, a binocular stereo vision camera, a time-of-flight camera.
13. The system of claim 6, wherein the instruction generation module (230) is configured to: Acquiring the voice instruction from the instruction recognition module (210) and acquiring the space pose information of the operation object from the vision processing module (220); Invoking the prediction model (24) to identify an operation intention corresponding to an operation object in the voice instruction, and predicting a movement path of the mechanical arm (22) according to the operation intention and space pose information of the operation object; Determining a plurality of movement track points of the mechanical arm (22) and a movement sequence of the plurality of movement track points based on the movement path and a preset movement rule of the mechanical arm (22); and executing instruction editing based on the plurality of moving track points and the moving sequence of the plurality of moving track points to generate the operation instruction sequence containing a plurality of operation instructions.
14. The system of claim 6, wherein the instruction execution module (240) includes: an instruction acquisition unit (242) connected to the instruction generation module (230), the instruction acquisition unit (242) being configured to acquire the sequence of operation instructions from the instruction generation module (230) and to sequentially extract one operation instruction from the sequence of operation instructions; An instruction execution unit (244) that connects the instruction acquisition unit (242) and the robot arm (22) and is configured to drive the robot arm (22) to perform an operation on the operation object based on an operation instruction extracted by the instruction acquisition unit (242); An execution detection unit (246) connecting the instruction acquisition unit (242) and the instruction execution unit (244), wherein the execution detection unit (246) is configured to acquire operation feedback data of the mechanical arm (22) to detect the execution progress of the operation instructions, if the execution of the operation instructions is detected to be completed, one operation instruction is sequentially acquired from the operation instruction sequence through the instruction acquisition unit (242) until all operation instructions in the operation instruction sequence are executed to be completed, and if the execution of the operation instructions is detected to be not completed, the operation feedback data of the mechanical arm (22) is reacquired to detect the execution progress of the operation instructions by driving the mechanical arm (22) through the instruction execution unit (244).
15. The system of claim 14, wherein the instruction execution unit (244) is further configured to: And converting the operation instruction into a control instruction through a RTDE interface of the mechanical arm (22) and sending the control instruction to the mechanical arm (22).
16. The system of claim 14, wherein the execution detection unit (246) is further coupled to the vision processing module (220), the execution detection unit (246) configured to: Acquiring image data of the mechanical arm (22) through the vision processing module (220) to obtain measuring and calculating pose information of the mechanical arm (22); Determining actual pose information of the mechanical arm (22) based on the measured pose information of the mechanical arm (22) and/or operation feedback data of the mechanical arm (22); and comparing the actual pose information of the mechanical arm (22) with the target pose information in the operation instruction to obtain the execution progress of the operation instruction.
17. An electronic device (800) comprising a processor (802), a communication interface (804), a memory (806) and a communication bus (808), the processor (802), the communication interface (804) and the memory (806) completing communication with each other via the communication bus (808); the memory (806) is configured to store at least one executable instruction that causes the processor (802) to perform operations corresponding to the natural language based operation control method according to any one of claims 1 to 5.
18. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the natural language based operation control method of any one of claims 1 to 5.
19. A computer program product comprising computer instructions that instruct a computing device to perform operations corresponding to the natural language based operation control method as claimed in any one of claims 1 to 5.

Description

Operation control method and system based on natural language and electronic equipment Technical Field The application relates to the field of industrial automation control, in particular to an operation control method, system, electronic equipment, storage medium and computer program product based on natural language. Background Robots are widely used to assist in production in modern industrial lines. However, existing robot deployment and operation often require specialized engineers to write complex programs, and have high technical thresholds, long deployment cycles and poor flexibility. The operator must have specialized programming knowledge to instruct the robot to perform a particular task, which greatly limits the application of the robot in unstructured environments or in situations where frequent changes in tasks are required. In addition, although some methods for controlling robots by integrating visual detection and voice recognition technologies exist in the related art, most of the underlying logic is limited to a rule-based matching mechanism or a command set of a closed vocabulary, and deep coupling of visual perception and language commands cannot be realized, so that it is difficult to dynamically generate an adaptive control strategy according to real-time scene changes. Therefore, how to reduce the technical difficulty of robot operation, so that a common operator can directly drive the robot to complete an automatic task through an intuitive natural language instruction, and the technical problem to be solved in the industry is urgent. Disclosure of Invention In view of the above, the present application aims to provide a natural language-based operation control method and system, which are used for implementing autonomous decision and accurate operation under complex tasks by integrating speech recognition, visual detection and path planning technologies to construct an end-to-end automatic control closed loop from natural language instructions to physical execution of a mechanical arm. According to a first aspect of the application, an operation control method based on natural language is provided and applied to a workstation provided with a mechanical arm, and the method comprises the steps of identifying a target object of a voice instruction and attribute information of the target object, wherein the voice instruction is expressed in a natural language form, detecting image data of the workstation based on the target object and the attribute information of the target object, determining an operation object matched with the target object in the workstation and space pose information of the operation object, identifying the operation intention corresponding to the operation object in the voice instruction, generating an operation instruction sequence of the mechanical arm according to the operation intention and the space pose information of the operation object, and driving the mechanical arm to execute operation on the operation object based on the operation instruction sequence. In some embodiments, the identifying the target object of the voice instruction and the attribute information of the target object includes: The method comprises the steps of obtaining a voice instruction, converting the voice instruction into a text instruction, extracting and splicing keywords of the text instruction to obtain prompt word information of the text instruction, and predicting the prompt word information of the text instruction to obtain the target object and attribute information of the target object, wherein the attribute information at least comprises one of appearance attribute information, space attribute information and semantic attribute information. In some embodiments, the workstation is further configured with an imaging device, the detecting the image data of the workstation based on the target object and the attribute information of the target object, determining the operation object matched with the target object in the workstation, and identifying the space pose information of the operation object comprises controlling the imaging device to acquire the image data of the workstation, wherein the image data comprises depth information, performing visual detection processing on the image data according to the attribute information of the target object and the target object, determining the operation object matched with the target object from at least one candidate object of the image data, and calculating the space pose information of the operation object according to the depth information of the operation object and the image data. In some embodiments, the identifying the operation intention corresponding to the operation object in the voice command, generating the operation command sequence of the mechanical arm according to the operation intention and the space pose information of the operation object comprises identifying the operation intention corresponding to the operati