CN-121230733-B - Visual language-based intelligent body track navigation system and method thereof

CN121230733BCN 121230733 BCN121230733 BCN 121230733BCN-121230733-B

Abstract

The invention discloses an agent track navigation system and a method thereof based on visual language, wherein the navigation system comprises a history track processing module, a track time sequence diagram model, a dynamic track interaction model, a first track prediction module, a second track prediction module and an optimized track prediction module, wherein the dynamic interaction model introduces side weights changing according to distance relations between agents to update adjacent nodes in a diagram to obtain an agent dynamic interaction relation, the first track prediction module trains a graph convolution network based on the target dynamic interaction relation to obtain an agent dynamic interaction feature sequence, the second prediction module trains the agent dynamic interaction feature sequence based on an attention mechanism to obtain an agent movement trend track sequence, and the optimized track prediction module calculates and outputs an optimal agent movement track based on a diversity sensing loss function to the agent movement trend track sequence.

Inventors

LIU RUONAN
JIANG ZIHAN

Assignees

上海交通大学

Dates

Publication Date: 20260508
Application Date: 20250930

Claims (4)

1. The visual language-based intelligent body track navigation system is characterized by comprising a historical track processing module, a track time sequence interaction diagram model, a dynamic track interaction model, a first track prediction module, a second track prediction module and an optimized track prediction module, wherein: the history track processing module acquires the position and the speed of the agent in the scene to obtain an agent history track sequence; The track time sequence interaction diagram model constructs a time sequence interaction diagram between the intelligent agents at each time step based on the historical track sequence, wherein: the track time sequence interaction diagram model The node of the track time sequence interaction diagram is an agent, the edge set of the track time sequence interaction diagram is a distance relation between the agents, wherein the initial characteristic of the node comprises a serial vector of position and speed, the speed is estimated by the position difference of adjacent moments so as to enhance the characterization capability of local motion trend, and the node Representing an agent At the moment of time Status features of (a) including location Sum speed of The method comprises the following steps: ; the edge set According to the distance threshold Construction when When the node And node Are connected; The dynamic interaction model introduces an edge weight changing according to the distance relation between the agents to update adjacent nodes in the graph to obtain the dynamic interaction relation of the agents, wherein the edge weight is as follows: ; Wherein: as the euclidean distance between the line bodies, Is a learnable parameter for controlling interaction strength, decay rate and action range; the first track prediction module trains the graph rolling network based on the target dynamic interaction relationship to obtain an agent dynamic interaction feature sequence; The second track prediction module trains the dynamic interaction feature sequence of the intelligent body based on an attention mechanism to obtain a track sequence of the motion trend of the intelligent body; The optimized track prediction module calculates and outputs the optimal intelligent body motion track for the intelligent body motion trend track sequence based on the diversity sensing loss function, wherein the method based on the diversity sensing loss function is as follows: ; Wherein: The mean square error loss is used for guaranteeing the prediction precision; Encouraging the variability of candidate trajectories for diversity loss; Is constant, avoids unstable numerical value and diversity constraint Is a diversity constraint, namely, is realized by adopting any one of the following modes: (i) Only the candidate with the smallest error with the true track is supervised based on a variety mechanism of 'best matching selection'; (ii) Based on the diversity regular terms of similarity inhibition among candidate tracks, excessive similarity among candidates under the same input is penalized by cosine similarity/kernel similarity, so that pattern collapse is avoided.
2. The visual language-to-agent trajectory navigation system of claim 1, wherein said first trajectory prediction module is: ; Wherein the method comprises the steps of Is a node Is used to determine the neighbor set of a neighbor, And (3) with Is the first The layer parameters are used to determine the layer parameters, To activate the function.
3. The visual language based agent trajectory guidance system of claim 1, wherein the attention mechanism is a Transformer structure comprising a multi-headed attention layer, a feed forward sub-layer, residual link and layer normalization, and applying position/time step coding to the time series features to capture long term dependencies.
4. A visual language based method for navigating an agent trajectory, characterized in that said navigation method is implemented on the basis of the system of any one of claims 1-3, comprising the steps of: s1, acquiring a historical track sequence of at least one intelligent agent in a target scene Wherein Representing two-dimensional coordinates; s2, at each time step based on the history track Building a time sequence interaction diagram The intelligent agent Represented as trace timing interaction graph nodes Its status features include position And speed of And according to the distance between intelligent objects, the time sequence interaction graph edge set ; S3, any adjacent node pair in the pair diagram Calculating side weights Updating node characterization in a graph rolling network by using the side weight modulated message transmission rule to acquire an agent dynamic interaction relation, wherein the side weight is used for updating the node characterization in the graph rolling network Given by the following formula: ; Wherein: as the euclidean distance between the line bodies, Is a learnable parameter for controlling interaction strength, decay rate and action range, predicts interval length Length of observation interval And generate at least for each input sample Candidate future trajectories to enhance multi-modal coverage; s4, training the graph convolution network based on the target dynamic interaction relationship to obtain an agent dynamic interaction feature sequence; S5, training the dynamic interaction feature sequence of the intelligent agent based on the attention mechanism to obtain the motion trend track sequence of the intelligent agent ; S6, calculating and outputting an optimal movement track of the intelligent body by the optimization track prediction module on the basis of a diversity sensing loss function to the movement trend track sequence of the intelligent body, carrying out joint training on parameters of a graph rolling network and a time sequence prediction module by a random gradient descent type optimizer, iterating in batches until convergence, and optionally adopting an early-stopping strategy to improve generalization performance, wherein the diversity sensing loss is as follows: 。

Description

Visual language-based intelligent body track navigation system and method thereof Technical Field The invention relates to the technical field of artificial intelligence and intelligent traffic, in particular to an intelligent body track navigation system and a method thereof based on visual language, which are suitable for application scenes such as automatic driving, intelligent robots, intelligent monitoring and the like which need to model and predict future movement trend of an intelligent body in a complex dynamic scene. Background The intelligent body track prediction is a key problem for realizing safe and efficient navigation by an automatic driving system and an intelligent robot. In dynamic and complex traffic or public environments, the movement of an agent is not only driven by the intention of the agent itself, but is also deeply influenced by environmental interactions and environmental constraints of surrounding people. Therefore, how to accurately predict the future track of an intelligent agent in an environment with extremely strong uncertainty is a core challenge for guaranteeing the obstacle avoidance capability and decision security of an intelligent system. The existing track prediction method mainly comprises three types, namely a physical model method, a machine learning method and a deep learning method. (1) The physical model method is based on classical mechanics principle, and deduces the track, such as an environmental force model, by establishing an interactive mechanics model of an agent and an environment. The method can intuitively reflect the motion law, but depends on a large number of priori assumptions, has insufficient adaptability, and is difficult to popularize and apply in complex scenes. (2) The machine learning method predicts by feature engineering the historical trajectories and using traditional classification regression models (e.g., gaussian processes, decision trees, bayesian networks, etc.). The method has the advantages of small calculated amount, strong flexibility, serious dependence on characteristic engineering, easy fitting and weak interpretation when processing high-dimensional complex interaction. (3) The deep learning method has remarkable performance in recent years, and the model based on the structures such as a cyclic neural network, a graph convolution network and a transducer has remarkable advantages in the aspect of high-dimensional space-time data modeling. The method can automatically extract the space-time characteristics in the track data, and improves the prediction precision and generalization to a certain extent. However, the existing deep learning method still has two limitations, namely that firstly, the physical rationality and the interpretability are lacked, the environment interaction is usually modeled as a black box process, the kinematic difference is ignored, secondly, the prediction precision and the diversity are difficult to be compatible, the excessive emphasis precision can cause the collapse of a prediction result mode, and the excessive emphasis diversity can reduce the reliability of a prediction track. In summary, it is difficult to maintain the accuracy and diversity of the prediction result while maintaining the physical rationality of the prediction in the existing method, so a new technical scheme is needed to introduce physical constraints in the graph structure modeling and realize multi-mode trajectory prediction through improved loss functions, thereby enhancing the interpretation, robustness and practical application value of the method. Disclosure of Invention Aiming at the defects of the existing intelligent body track prediction method, on one hand, the modeling of the existing method on environment interaction is mostly a black box process and lacks of physical rationality and interpretability, on the other hand, the prediction result is difficult to balance between accuracy and diversity, and mode collapse or prediction deviation is easy to occur, so that the robustness and reliability of the system in a complex dynamic environment are reduced. In order to solve the problems existing in the prior art, the invention adopts the following technical scheme: The navigation system comprises a history track processing module, a track time sequence diagram model, a dynamic track interaction model, a first track prediction module, a second track prediction module and an optimized track prediction module, wherein: the history track processing module acquires the position and the speed of the agent in the scene to obtain an agent history track sequence; The track time sequence interaction diagram model builds a time sequence interaction diagram between the intelligent agents at each time step based on the historical track sequence; the dynamic interaction model introduces the edge weight changing according to the distance relation between the agents to update adjacent nodes in the graph to obtain the dynamic in