CN-121996232-A - Human-computer interaction method and system for adjusting input and visual feedback based on VLA (very-large-scale architecture) real-time task

CN121996232ACN 121996232 ACN121996232 ACN 121996232ACN-121996232-A

Abstract

The invention provides a man-machine interaction method and a man-machine interaction system for real-time task adjustment input and visual feedback based on VLA. Belongs to the technical field of man-machine interaction, intelligent body with industrial robot control intersection. The method comprises the steps of receiving a natural language instruction input by a user, generating a robot task plan by utilizing a VLA model, analyzing the robot task plan into a high-level semantic action unit sequence, generating structured task logic data, constructing a draggable edited graphical task flow chart, generating initial task flow chart data, converting the robot task plan generated by the VLA into the draggable edited graphical flow chart, and visually presenting the draggable edited graphical flow chart by means of an AR/3D space, so that the transparency and controllability of man-machine interaction are greatly improved.

Inventors

YANG YIMING
LIU WEI

Assignees

深圳墨影科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (10)

1. The man-machine interaction method for adjusting input and visual feedback based on the real-time task of VLA is characterized by comprising the following steps: S1, receiving a natural language instruction input by a user, generating a robot task plan by utilizing a VLA model, analyzing the robot task plan into a high-level semantic action unit sequence, generating structured task logic data, constructing a draggable edited graphical task flow diagram, and generating initial task flow diagram data; S2, performing three-dimensional space alignment mapping on the initial task flow chart data and a real physical environment through an AR/3D space visualization technology to generate space visualization task flow chart data, overlapping AR space labels, and generating visualization task flow chart data with the space labels; S3, generating task flow chart modification instruction data according to the drag editing operation of the user on the visual task flow chart data with the space labels, analyzing the task flow chart modification instruction data, converting the task flow chart modification instruction data into corresponding structured task logic adjustment data, updating the initial task flow chart data, and generating optimized task flow chart data; S4, extracting action parameter data required by task execution through the optimized task flow chart data, performing collision detection analysis on the action parameter data and a real physical environment to generate collision detection result data, and further correcting the optimized task flow chart data to generate finally confirmed task flow chart data; and S5, generating robot execution instruction data according to the finally confirmed task flow chart data, sending the robot execution instruction data to a robot control system, driving the robot to safely execute according to the optimized and confirmed task logic, simultaneously feeding back execution state data in real time, dynamically updating the execution progress of the task flow chart in an AR/3D space visual interface, and generating real-time execution feedback visual data.
2. The human-machine interaction method of real-time task tuning input and visual feedback based on VLA of claim 1, wherein S1 comprises: s11, receiving a natural language instruction input by a user, and performing intention recognition and context association analysis through a multi-mode semantic understanding module of a VLA model to generate original task description data; S12, calling a task planning engine of the VLA model based on the original task description data, and generating a robot task plan by combining a preset action unit library and a logic rule library to generate structured task plan data; S13, according to the structured task plan data, automatically analyzing the high-level task logic into a high-level semantic action unit sequence by using a natural language and action unit mapping algorithm to generate structured task logic data; S14, based on the structured task logic data, calling a graphic rendering engine to construct a visual task flow chart framework, defining display rules and interaction logic of action nodes, judgment nodes and branch nodes, and generating initial task flow chart prototype data; s15, carrying out grammar checking and logic integrity checking on the original task flow chart data to generate the original task flow chart data.
3. The human-machine interaction method of real-time task tuning input and visual feedback based on VLA of claim 1, wherein S2 comprises: S21, acquiring initial task flow chart data, acquiring space information of a real physical environment through an AR space positioning module, and generating high-precision physical environment space model data; s22, based on the physical environment space model data and the initial task flow chart data, carrying out coordinate system integration, logic position matching and scale calibration by utilizing a three-dimensional space alignment algorithm to generate space visualization task flow chart base data; S23, overlapping AR space annotation information on space visualization task flow chart basic data, defining an annotated display style and interaction triggering conditions, and generating visualization task flow chart data with primary space annotation; S24, adjusting the transparency, the size and the position of the AR space annotation by a user visual angle tracking and scene self-adaptive algorithm to generate visualized task flow chart data with accurate space annotation; S25, the visual task flow chart data with the accurate space labels are projected to the AR/3D display equipment in real time, and the scene interaction preview data are generated.
4. The human-machine interaction method of the real-time task tuning input and the visual feedback based on the VLA of claim 3, wherein S22 comprises: obtaining physical environment space model data and initial task flow chart data, and carrying out format standardization processing on the two types of data to generate dual-source data in a unified format; Based on the dual-source data in the unified format, carrying out coordinate system one operation by adopting a three-dimensional coordinate conversion algorithm to generate dual-source data with aligned coordinate systems; based on the double-source data aligned by the coordinate system, carrying out association matching operation of the task node and the physical environment position, and generating position matching mapping data; Based on the position matching mapping data, performing scale proportion calibration operation, eliminating scale difference of the two types of data, and generating virtual-real fusion data after scale calibration; Based on the virtual-real fusion data after the scale calibration, fusion integrity check operation is carried out, matching deviation is corrected, and space visualization task flow chart basic data is generated.
5. The human-machine interaction method of real-time task tuning input and visual feedback based on VLA of claim 1, wherein S3 comprises: S31, capturing the operation of a user on visual task flow chart data with accurate space annotation in real time through an interaction perception module, recording operation information, and generating task flow chart modification operation data; s32, modifying operation data based on the task flow chart, converting visual operation of a user into a structured task logic adjustment instruction by utilizing an operation instruction analysis engine, determining a core requirement, and generating task flow chart modification instruction data; S33, invoking a task logic updating algorithm to pertinently adjust action unit sequences, logic dependency relationships and parameter configuration information in the initial task flow chart data, and generating temporary optimization task flow chart data; s34, performing logic conflict detection and consistency verification on the temporary optimization task flow chart data to generate optimized task flow chart data; And S35, synchronizing the optimized task flow chart data to an AR/3D visual interface, refreshing the flow chart display state in real time, allowing a user to secondarily confirm the adjustment effect, and generating adjustment confirmation feedback data.
6. The human-machine interaction method of the VLA-based real-time task tuning input and visual feedback of claim 5, wherein S33 comprises: acquiring task flow diagram modification instruction data and initial task flow diagram data, performing association mapping processing on the two types of data, and generating associated task data to be adjusted; extracting an action unit sequence adjustment requirement based on the associated task data to be adjusted, executing action unit sequence rearrangement operation, and generating adjusted action unit sequence data; analyzing the logic dependency relation adjustment requirement based on the adjusted action unit sequence data, carrying out logic dependency link reconstruction operation, and generating reconstructed logic dependency data; based on the reconstructed logic dependent data, matching parameter configuration adjustment rules, executing parameter updating operation, and generating flow chart intermediate data after parameter updating; And carrying out data integration operation based on the flow chart intermediate data after parameter updating, correcting data association deviation, and generating temporary optimization task flow chart data.
7. The human-machine interaction method of real-time task tuning input and visual feedback based on VLA of claim 1, wherein S4 comprises: S41, extracting key action parameters corresponding to each action unit from the optimized task flow chart data, and generating a standardized action parameter data set; s42, associating the standardized action parameter data set with physical environment space model data, and simulating collision risks with environmental obstacles, self joints and other devices in the execution process of the robot through a three-dimensional collision detection engine to generate collision detection original result data; S43, carrying out risk grading and collision reason tracing on the collision detection original result data, and determining time nodes, position coordinates and triggering parameters of collision occurrence to generate collision detection analysis report data; S44, according to the collision detection analysis report data, the action parameters with collision risk are adjusted by combining an action parameter optimization algorithm, and corrected task flow chart data are generated; S45, performing secondary collision detection and execution feasibility verification on the corrected task flow chart data, and generating finally confirmed task flow chart data.
8. The human-machine interaction method of the VLA-based real-time task tuning input and visual feedback of claim 7, wherein S42 comprises: acquiring a standardized action parameter data set and physical environment space model data, and executing data association fusion processing to generate an association fusion data set; Constructing a collision detection scene model frame based on the association fusion data set, and generating scene model basic data; Importing motion related parameters in the standardized motion parameter data into scene model basic data to generate motion trail simulation data; based on the motion trail simulation data and the environmental element data in the scene model basic data, starting three-dimensional collision detection operation to generate preliminary collision detection data; And performing redundancy elimination and validity check on the preliminary collision detection data, and integrating to generate collision detection original result data.
9. The human-machine interaction method of real-time task tuning input and visual feedback based on VLA of claim 1, wherein S5 comprises: S51, based on the finally confirmed task flow chart data, logically converting the structured task into a bottom execution instruction which can be identified by a robot control system through an instruction generation engine, and generating robot execution instruction data; S52, sending the robot execution instruction data to a robot control system in real time through a high-speed communication protocol, establishing a bidirectional data transmission channel for instruction issuing and state feedback, and driving the robot to start execution according to the optimized and confirmed task logic to generate instruction issuing confirmation data; s53, acquiring execution state data of the robot in real time through a robot body sensor and an environment sensor in the execution process of the robot, and acquiring and uploading millisecond data; s54, based on the execution state data, calling a visual feedback engine to dynamically update the execution node state, progress bar display and abnormal early warning identification of the task flow chart in an AR/3D space visual interface, synchronously displaying the real-time position and the gesture of the robot in a physical environment, and generating real-time execution feedback visual data; s55, supporting the user to perform emergency intervention based on the real-time execution feedback visual data, synchronizing the intervention instruction to the robot control system in real time after analysis, forming closed-loop interaction, and generating interactive closed-loop man-machine interaction data.
10. A system for implementing the VLA-based real-time task tuning input and visual feedback human-machine interaction method of claim 1, the system comprising: The instruction receiving module is used for receiving a natural language instruction input by a user, generating a robot task plan by utilizing a VLA model, analyzing the robot task plan into a high-level semantic action unit sequence, generating structured task logic data, constructing a draggable edited graphical task flow chart, and generating initial task flow chart data; The space alignment module is used for carrying out three-dimensional space alignment mapping on the initial task flow chart data and the real physical environment through an AR/3D space visualization technology to generate space visualization task flow chart data; the data updating module is used for generating task flow chart modification instruction data according to the drag editing operation of a user on the visual task flow chart data with the space labels, analyzing the task flow chart modification instruction data, converting the task flow chart modification instruction data into corresponding structured task logic adjustment data, updating the initial task flow chart data and generating optimized task flow chart data; The detection analysis module is used for extracting action parameter data required by task execution through the optimized task flow chart data, carrying out collision detection analysis on the action parameter data and a real physical environment to generate collision detection result data, and further correcting the optimized task flow chart data to generate finally confirmed task flow chart data; And the feedback execution module is used for generating robot execution instruction data according to the finally confirmed task flow chart data, sending the robot execution instruction data to a robot control system, driving the robot to safely execute according to the optimized and confirmed task logic, simultaneously feeding back the execution state data in real time, dynamically updating the execution progress of the task flow chart in an AR/3D space visual interface, and generating real-time execution feedback visual data.

Description

Human-computer interaction method and system for adjusting input and visual feedback based on VLA (very-large-scale architecture) real-time task Technical Field The invention provides a man-machine interaction method and a man-machine interaction system based on VLA (virtual local area network) real-time task adjustment input and visual feedback, and belongs to the technical field of man-machine interaction, intelligent body and industrial robot control intersection. Background In the field of man-machine interaction, along with the continuous development of robot technology, how to realize efficient, transparent and controllable man-machine cooperation becomes a key problem. The current man-machine interaction system based on the large model exposes a plurality of limitations in practical application, and is difficult to meet the man-machine interaction requirement in modern complex scenes. The traditional system often executes black box, after a user gives a natural language instruction, the system directly generates and executes an action sequence, and the user cannot preview, modify or understand intermediate decision logic, just like facing a mystery black box, and lacks control over the task execution process. Meanwhile, the lack of spatial context feedback does not visualize the alignment of the mission plan and the real physical environment in the three-dimensional space, and it is difficult for a user to judge whether the operation of the robot is accurate, for example, whether the robot grabs the correct object or not and whether the path plan collides with surrounding objects or not cannot be determined. In addition, when the task needs to be corrected, the editing threshold is high, the instruction is usually required to be described again or the code is usually required to be written, the graphical and drag type low code adjustment is not supported, the operation is complex, and the efficiency is low. Moreover, VLA outputs are mostly original joint trajectories or API call sequences, which are not deconstructed into high-level semantic action units, resulting in editing difficulties. The prior public demonstration is limited to unidirectional interaction video or fixed UI button selection of preset tasks, VLA generation results are not converted into editable graphical task flow charts and AR space labels are overlapped, and the intelligent voice remote controller is still basically still, and deep collaboration of 'man-machine co-programming and collaborative decision' cannot be realized, so that a novel man-machine interaction method is needed. Disclosure of Invention The invention provides a man-machine interaction method and a man-machine interaction system for adjusting input and visual feedback of a real-time task based on VLA, which are used for solving the problems mentioned in the background art: The invention provides a man-machine interaction method for adjusting input and visual feedback of a real-time task based on VLA, which comprises the following steps: S1, receiving a natural language instruction input by a user, generating a robot task plan by utilizing a VLA model, analyzing the robot task plan into a high-level semantic action unit sequence, generating structured task logic data, constructing a draggable edited graphical task flow diagram, and generating initial task flow diagram data; S2, performing three-dimensional space alignment mapping on the initial task flow chart data and a real physical environment through an AR/3D space visualization technology to generate space visualization task flow chart data, overlapping AR space labels, and generating visualization task flow chart data with the space labels; S3, generating task flow chart modification instruction data according to the drag editing operation of the user on the visual task flow chart data with the space labels, analyzing the task flow chart modification instruction data, converting the task flow chart modification instruction data into corresponding structured task logic adjustment data, updating the initial task flow chart data, and generating optimized task flow chart data; S4, extracting action parameter data required by task execution through the optimized task flow chart data, performing collision detection analysis on the action parameter data and a real physical environment to generate collision detection result data, and further correcting the optimized task flow chart data to generate finally confirmed task flow chart data; and S5, generating robot execution instruction data according to the finally confirmed task flow chart data, sending the robot execution instruction data to a robot control system, driving the robot to safely execute according to the optimized and confirmed task logic, simultaneously feeding back execution state data in real time, dynamically updating the execution progress of the task flow chart in an AR/3D space visual interface, and generating real-time execution feedback visual