CN-122020510-A - Intelligent motion trail planning and interaction device based on multi-mode vision

CN122020510ACN 122020510 ACN122020510 ACN 122020510ACN-122020510-A

Abstract

The invention provides an intelligent motion trail planning and interacting device based on multi-mode vision, which relates to the technical field of trail planning and interacting devices and comprises the processes of starting and task initialization, multi-mode vision information acquisition, data preprocessing and feature extraction, multi-mode feature fusion and environment modeling, intelligent trail planning generation, interaction instruction execution and state feedback, closed loop adjustment and task completion and the like; auxiliary features such as touch sense and language are integrated in the processes of data preprocessing, feature extraction, multi-mode feature fusion and environment modeling, interactive feedback is output through a voice and touch control module, the limitation of fixed instruction interaction of a traditional device is broken through, and when the interactive process is integrated with the track planning depth, the track planning strategy can be dynamically adjusted through multi-mode feature fusion analysis intention, so that the problems that in a complex dynamic environment, perception dimension is single, environment understanding is incomplete, adaptability of track planning is insufficient, and dynamic response is lagged are solved.

Inventors

ZHU JINGYI

Assignees

一启共创(苏州)机器人有限公司

Dates

Publication Date: 20260512
Application Date: 20251224

Claims (8)

1. The intelligent motion trail planning and interacting device based on the multi-mode vision is characterized by comprising the following steps: S1, starting and initializing a task; S2, multi-mode visual information acquisition; S3, data preprocessing and feature extraction; s4, multi-mode feature fusion and environment modeling; S5, intelligent track planning generation; S6, executing an interactive instruction and feeding back a state; and S7, closed loop adjustment and task completion.
2. The intelligent motion trail planning and interacting device based on multi-mode vision according to claim 1, wherein the S1 comprises hardware activation and task loading, and the device is specifically as follows: starting a multi-mode visual sensor array, a computing unit, an executing mechanism and an interaction module to finish hardware self-checking and parameter calibration; the task loading step comprises the steps of inputting a task target, and loading a pre-training multi-mode model and scene parameters.
3. The intelligent motion trail planning and interaction device based on multi-mode vision according to claim 1, wherein the step S2 comprises multi-source vision data acquisition and data synchronization processing, and the method is specifically as follows: the multi-source visual data acquisition comprises visual images, depth/time sequence data and auxiliary sensing fusion; Acquiring an environment image through a 2D camera and acquiring point cloud data through a 3D camera, and acquiring coordinate information of a target object; the depth/time sequence data comprises that a depth camera collects scene depth information, and a time sequence camera records historical motion trail of a dynamic target; the auxiliary sensing fusion is to optionally fuse point cloud distance data of a laser radar and pressure feedback of a touch sensor, so as to enrich sensing dimension; the data synchronization processing is that the sensor data is subjected to Ji Duo pairs through a time stamp, so that the problem of equipment sampling delay is solved, and consistency of images, point clouds and depth data in a time dimension is ensured.
4. The intelligent motion trail planning and interacting device based on multi-modal vision according to claim 1, wherein the step S3 comprises data cleaning and format unification and single-modal feature coding, and the method is specifically as follows: The data cleaning and format unification comprises the following specific steps: 1. Carrying out Gaussian filtering denoising on the image, and sampling point cloud data to reduce redundancy; 2. Converting the target position under the pixel coordinate system into a global coordinate system through affine coordinate transformation, so as to realize the space dimension unification of the multi-mode data; 3. Smoothing the discrete track of the dynamic target to remove noise interference; the single-mode feature code comprises visual features, time sequence features and touch/language features, and is specifically as follows: Extracting spatial features of semantic feature point clouds of the images; capturing the motion trend of a dynamic target by utilizing a time sequence coding module; The haptic/language features are spatiotemporal features of haptic signals extracted through architecture or task instructions converted to language embedded features through language models for interactive scenes.
5. The intelligent motion trail planning and interaction device based on multi-modal vision according to claim 1, wherein the step S4 comprises cross-modal feature alignment, environment modeling and intention analysis, and the method is specifically as follows: the cross-modal feature alignment is that a projector is utilized to map the features such as vision, touch sense, language and the like to a unified embedded space, and feature fusion is realized through a cross-attention mechanism; The environment modeling and intention analysis comprises the following specific steps: 1. Generating a 3D environment model based on the fusion characteristics, and dividing a static barrier, a dynamic region and an operable region; 2. predicting the future motion trail of the dynamic target, and analyzing the motion intention of the device.
6. The intelligent motion trail planning and interacting device based on multi-modal vision according to claim 1, wherein the step S5 includes initial trail generation, multi-objective optimization, trail feasibility verification, specifically comprising the following steps: Generating an initial track, namely generating the initial track, such as a basic path from a starting point to an end point or an articulation sequence of a mechanical arm, through a model based on an environment model and a task target; the multi-objective optimization is to consider the safety, the high efficiency and the smoothness and integrate the visual semantic constraint; The track feasibility checking comprises the steps of checking whether the track has collision risk or not through detection and simulation deduction, or whether the track action can be executed by the equipment, and returning to re-optimize the track if the checking fails.
7. The intelligent motion trail planning and interacting device based on multi-modal vision according to claim 1, wherein the step S6 includes instruction conversion and execution and real-time status collection, specifically as follows: converting the planned track into a control instruction of an executing mechanism, and outputting interactive feedback through a voice and touch module; And the real-time state acquisition is to monitor the motion state and the environmental change of the device through a sensor and transmit feedback data back to the fusion module.
8. The intelligent motion trail planning and interacting device based on multi-modal vision according to claim 1, wherein the step S7 includes dynamic trail update and task determination, and the method specifically includes the following steps: the dynamic track updating is to re-optimize the multi-mode feature fusion result based on feedback data and adjust the track planning strategy; the task judgment is that if the track execution is completed and the task target is met, the device enters a standby/stop state, and if the track execution is not completed, the process is repeated.

Description

Intelligent motion trail planning and interaction device based on multi-mode vision Technical Field The invention relates to the technical field of track planning and interaction devices, in particular to an intelligent motion track planning and interaction device based on multi-mode vision. Background Along with the rapid iteration in the fields of intelligent transportation, service robots and the like, higher requirements are provided for the environment perception precision, track planning flexibility and man-machine/machine interaction intellectualization of the motion equipment, when the conventional device is used in a scene, the motion device is required to break through the traditional mode of single perception-fixed planning, the device has the capabilities of multi-dimensional environment understanding, dynamic track optimization and flexible interaction, and multi-mode vision is used as a technical path closest to the human perception mode, so that the device is required to meet the industry requirements, and the intelligent motion track planning and interaction device based on the multi-mode vision is required. With respect to the current motion trail planning and interaction device, in a complex dynamic environment, the device still has single perception dimension, incomplete environment understanding and insufficient trail planning adaptability, and has a dynamic response lag condition, so that the device is difficult to meet the application of high-precision and high-reliability scenes. Disclosure of Invention The invention relates to an intelligent motion trail planning and interacting device based on multi-mode vision, which integrates auxiliary characteristics such as touch sense, language and the like in the processes of data preprocessing and characteristic extraction, multi-mode characteristic fusion and environment modeling, and outputs interaction feedback through a voice and touch control module, so that the limitation of fixed instruction interaction of the traditional device is broken through, when the interaction process is deeply fused with trail planning, the trail planning strategy can be dynamically adjusted through multi-mode characteristic fusion analysis intention, and the execution state is confirmed to a user through real-time feedback, the problems of disjoint and insufficient flexibility of the traditional interaction and planning are solved, and the convenience and personification level of interaction are improved. The invention provides an intelligent motion trail planning and interacting device based on multi-mode vision, which comprises the following steps: S1, starting and initializing a task; S2, multi-mode visual information acquisition; S3, data preprocessing and feature extraction; s4, multi-mode feature fusion and environment modeling; S5, intelligent track planning generation; S6, executing an interactive instruction and feeding back a state; and S7, closed loop adjustment and task completion. Preferably, the step S1 includes hardware activation and task loading, and specifically includes the following steps: starting a multi-mode visual sensor array, a computing unit, an executing mechanism and an interaction module to finish hardware self-checking and parameter calibration; the task loading step comprises the steps of inputting a task target, and loading a pre-training multi-mode model and scene parameters. Preferably, the step S2 includes multi-source visual data acquisition and data synchronization processing, and specifically includes the following steps: the multi-source visual data acquisition comprises visual images, depth/time sequence data and auxiliary sensing fusion; Acquiring an environment image through a 2D camera and acquiring point cloud data through a 3D camera, and acquiring coordinate information of a target object; the depth/time sequence data comprises that a depth camera collects scene depth information, and a time sequence camera records historical motion trail of a dynamic target; the auxiliary sensing fusion is to optionally fuse point cloud distance data of a laser radar and pressure feedback of a touch sensor, so as to enrich sensing dimension; the data synchronization processing is that the sensor data is subjected to Ji Duo pairs through a time stamp, so that the problem of equipment sampling delay is solved, and consistency of images, point clouds and depth data in a time dimension is ensured. Preferably, the step S3 includes data cleansing and format unification, and single-mode feature coding, and specifically includes the following steps: The data cleaning and format unification comprises the following specific steps: 1. Carrying out Gaussian filtering denoising on the image, and sampling point cloud data to reduce redundancy; 2. Converting the target position under the pixel coordinate system into a global coordinate system through affine coordinate transformation, so as to realize the space dimension unification of the mul