CN-122022033-A - Interactive semantic pedestrian track prediction method based on language model and reinforcement learning
Abstract
The invention relates to the technical field of pedestrian track prediction, in particular to an interactive semantic pedestrian track prediction method based on a language model and reinforcement learning. The method comprises the steps of calculating physical measurement of space-time relations among quantized rows of people based on a multi-row human track coordinate sequence, judging interaction relation between each target pedestrian and adjacent pedestrians according to the physical measurement, converting historical observation track coordinates of the target pedestrians and interaction relation description into natural language prompts, taking a language model as a basic model, taking the natural language prompts as input, taking corresponding future track coordinate texts as training targets, conducting full-parameter supervision fine tuning to obtain a supervision fine tuning model, freezing backbone network parameters of the supervision fine tuning model, introducing and training a low-rank self-adaptive parameter module, and optimizing with a programmable task rewarding function by adopting a near-end strategy optimization algorithm to obtain a prediction model. The pedestrian track prediction method and the pedestrian track prediction device can realize high-precision pedestrian track prediction which can be interpreted and accords with physical scene constraint.
Inventors
- YANG BIAO
- XU ZIRUI
- NI RONGRONG
- YU ZHITAO
- CHEN YANG
- JIANG JIAMING
Assignees
- 常州大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260123
Claims (10)
- 1. A method for predicting the pedestrian track of interactive semantics based on language model and reinforcement learning is characterized in that, Comprising the following steps: Determining the interactive relation between each target pedestrian and the adjacent pedestrians according to the physical measurement, and jointly converting the historical observation track coordinates of the target pedestrians and the interactive relation description into a structured natural language prompt; taking a language model of an encoder-decoder framework as a basic model, taking natural language prompts as input, taking corresponding future track coordinate texts as training targets, and performing full-parameter supervision fine tuning to obtain a supervision fine tuning model; Based on the supervision fine tuning model, freezing the main network parameters of the supervision fine tuning model, and introducing and training a low-rank self-adaptive parameter module; And inputting natural language prompts of the target pedestrians to be predicted into a prediction model, generating a text sequence describing the future track, and extracting a coordinate sequence from the text sequence describing the future track by using a parser to obtain the final pedestrian track prediction.
- 2. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 1, wherein, The physical metrics include relative distance, heading angle, and relative azimuth angle.
- 3. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 1, wherein, Types of interactions include companion, follower and barrier.
- 4. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 1, wherein, The language model is a T5-small language model.
- 5. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 1, wherein, And in the process of performing full-parameter supervision fine tuning training on the basic model, cross entropy loss is used as an optimization target.
- 6. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 1, wherein, The task rewards function is configured to synthesize a negative value of an average displacement error between the computed trajectory and the true value, and a boundary crossing coordinate point penalty based on the binary scene semantic mask.
- 7. The interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning according to claim 6, wherein, The task rewarding function formula is: ; ; ; In the formula, Representing a task rewarding function; And (3) with Representing the weight; the function represents the binary mask value of the position of the checking track point, T represents the time step; representing text generated by Analyzing the obtained coordinate sequence; Representing the real trajectory.
- 8. An apparatus, characterized in that, Comprising the following steps: A memory for storing a computer program; A processor for executing the computer program, which when executed by the processor implements the steps of the interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning as claimed in any one of claims 1 to 7.
- 9. A readable storage medium having a computer program stored thereon, characterized in that, The computer program, when executed by a processor, implements the steps of the interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning as claimed in any one of claims 1 to 7.
- 10. A computer program product comprising a computer program, characterized in that, The computer program, when executed by a processor, implements the steps of the interactive semantic pedestrian trajectory prediction method based on language model and reinforcement learning as claimed in any one of claims 1 to 7.
Description
Interactive semantic pedestrian track prediction method based on language model and reinforcement learning Technical Field The invention relates to the technical field of pedestrian track prediction, in particular to an interactive semantic pedestrian track prediction method based on a language model and reinforcement learning. Background Trajectory prediction aims at predicting future motion trajectories of individuals by analyzing historical motion behavior patterns of the individuals, and the technology plays a vital role in autonomous systems such as automatic driving, behavior analysis, robot path planning and the like. The existing prediction method can be mainly divided into two major types, namely a model based on physical rules and a data-driven model. Although the method based on the physical rule has a certain interpretability, the method is difficult to cope with complex and changeable interaction scenes in the real world. With the development of deep learning technology, a data-driven method has become a mainstream mode in the field, and researchers widely adopt structures such as a cyclic neural network, a graph neural network and a attention mechanism to model the time sequence characteristics of pedestrian movement and social interaction relation. In addition, to capture the inherent randomness of pedestrian motion, multi-modal generation models, such as antagonism networks, conditional variabilities, self-encoders, etc., are introduced to predict diverse future trajectories. Although the deep learning approach has made significant progress in prediction accuracy, its inherent "black box" nature results in a prediction result that lacks transparency and interpretability, which limits its reliable application in safety critical scenarios such as autopilot. In recent years, a language model shows strong capability in various sequence generation and reasoning tasks, and an output form based on natural language also provides a new way for solving the problem of interpretability. Researchers began exploring the application of language models to track prediction tasks by textually transforming a sequence of track coordinates and utilizing the contextual understanding and generating capabilities of the language models to make predictions. However, in the existing method, only track coordinates are simply serialized, interactive semantics among pedestrians cannot be explicitly modeled, and deviation exists between training targets (such as cross entropy loss) and final evaluation indexes (such as ADE/FDE) of track prediction, so that prediction accuracy and scene compliance still have room for improvement. Therefore, a new track prediction method capable of deeply fusing interaction semantics and aligning a predicted target and having high precision and interpretability is needed. Therefore, there is a need to solve the above technical problems to have both accuracy and interpretability. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides an interactive semantic pedestrian track prediction method based on a language model and reinforcement learning, which can realize high-precision pedestrian track prediction which can be interpreted and accords with physical scene constraint. In order to solve the technical problems, the technical scheme of the invention is that the interactive semantic pedestrian track prediction method based on the language model and reinforcement learning comprises the following steps: Determining the interactive relation between each target pedestrian and the adjacent pedestrians according to the physical measurement, and jointly converting the historical observation track coordinates of the target pedestrians and the interactive relation description into a structured natural language prompt; taking a language model of an encoder-decoder framework as a basic model, taking natural language prompts as input, taking corresponding future track coordinate texts as training targets, and performing full-parameter supervision fine tuning to obtain a supervision fine tuning model; Based on the supervision fine tuning model, freezing the main network parameters of the supervision fine tuning model, and introducing and training a low-rank self-adaptive parameter module; And inputting natural language prompts of the target pedestrians to be predicted into a prediction model, generating a text sequence describing the future track, and extracting a coordinate sequence from the text sequence describing the future track by using a parser to obtain the final pedestrian track prediction. Further, the physical metrics include relative distance, heading angle, and relative azimuth angle. Further, the types of interactions include accompaniment, follow-up, and obstruction. Further, the language model is a T5-small language model. Further, in the process of performing full-parameter supervision fine-tuning training on the basic model, cross entropy loss