CN-122009216-A - Method and device for generating control strategy of automatic driving vehicle

CN122009216ACN 122009216 ACN122009216 ACN 122009216ACN-122009216-A

Abstract

The invention discloses a method and a device for generating an automatic driving vehicle control strategy, and relates to the technical field of automatic driving. The method comprises the steps of obtaining environment sensing data of an automatic driving vehicle at the current moment, extracting features of the environment sensing data to obtain state features, calculating complexity of a current driving task according to the state features, determining the number of progressive layers to be activated based on the complexity, inputting the state features into a basic network layer of a progressive neural network and the progressive layers to be activated respectively, outputting a general strategy through the basic network layer, outputting a specific strategy aiming at a specified complex scene through the progressive layers, adjusting knowledge retention weights and strategy learning weights according to the number of the progressive layers to be activated, and performing continuous near-end strategy optimization to obtain an optimal control strategy of the automatic driving vehicle at the current moment. The implementation method can cope with complex and changeable task scenes with different scales in the automatic driving decision, and improves the accuracy and adaptability of the decision.

Inventors

LI WEI

Assignees

京东鲲鹏(江苏)科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260326

Claims (13)

1. A method of generating an autonomous vehicle control strategy, comprising: acquiring environment sensing data of an automatic driving vehicle at the current moment, and extracting features of the environment sensing data to obtain state features; Calculating the complexity of the current driving task according to the state characteristics, and determining the number of progressive layers to be activated based on the complexity; Respectively inputting the state characteristics into a basic network layer and an activated progressive layer of the progressive neural network, outputting a general strategy through the basic network layer, and outputting a specific strategy aiming at a specified complex scene through the progressive layer; Adjusting knowledge retention weights and strategy learning weights according to the number of progressive layers to be activated; And performing continuous near-end strategy optimization based on the general strategy, the specific strategy, the knowledge reservation weight and the strategy learning weight to obtain an optimal control strategy of the automatic driving vehicle at the current moment.
2. The method of claim 1, wherein prior to calculating the complexity of the current driving task from the state characteristics, the method further comprises: calculating the real-time degree of the current driving task according to the state characteristics, and determining that the real-time degree of the current driving task does not meet a preset condition; Under the condition that the real-time degree meets the preset condition, directly outputting a safety control instruction as an optimal control strategy.
3. The method of claim 1, wherein calculating the complexity of the current driving task from the state features comprises: acquiring a complexity score rule corresponding to each factor included in the state characteristics; determining complexity scores corresponding to all the factors based on a first state feature corresponding to all the factors in the state features and a complexity score rule corresponding to all the factors; and calculating the complexity of the current driving task based on the weight value corresponding to each factor and the complexity score corresponding to each factor.
4. A method according to claim 1 or 3, wherein determining the number of progressive layers to activate based on the complexity comprises: Presetting a group of complexity thresholds; Sequentially comparing the complexity with each complexity threshold to determine a target complexity threshold, and taking the progressive layer number corresponding to the target complexity threshold as the progressive layer number to be activated; or a neural network classifier is adopted to directly output the progressive layer identification to be activated.
5. The method of claim 1, wherein the number of progressive layers that need to be activated is positively correlated with the complexity; The knowledge retention weight is inversely related to the number of progressive layers to be activated, and the strategy learning weight is positively related to the number of progressive layers to be activated.
6. The method of claim 1, wherein performing continuous near-end policy optimization based on the general policy, the specific policy, the knowledge retention weight, and the policy learning weight, obtains an optimal control policy for a current time of the autonomous vehicle, comprising: Based on a continuous near-end policy optimization objective function, jointly optimizing network parameters of the basic network layer and the progressive layer, wherein the continuous near-end policy optimization objective function is related to the general policy, the specific policy, the knowledge retention weight and the policy learning weight; generating an optimized general strategy and an optimized specific strategy based on the optimized basic network layer and the progressive layer; And obtaining an optimal control strategy of the automatic driving vehicle at the current moment based on the optimized general strategy, the optimized specific strategy, the knowledge retention weight and the strategy learning weight.
7. The method of claim 6, wherein the method further comprises: acquiring a behavior result state of the automatic driving vehicle at the last moment; The continuous near-end strategy optimization objective function is also related to a reward signal corresponding to the behavior result state of the automatic driving vehicle at the last moment; Based on the optimized general strategy, the optimized specific strategy, the knowledge retention weight and the strategy learning weight, obtaining an optimal control strategy of the automatic driving vehicle at the current moment comprises the following steps: and obtaining an optimal control strategy of the automatic driving vehicle at the current moment based on the optimized general strategy, the optimized specific strategy, the knowledge retention weight, the strategy learning weight and the reward signal.
8. The method of claim 7, wherein the persistent near-end policy optimization objective function is: ; Wherein alpha (x) is policy learning weight, beta (x) is knowledge retention weight, To tie down the truncated policy penalty for constraining the update amplitude between the old policy and the new policy, A loss is reserved for knowledge used to constrain the degree of knowledge retention between the old and new policies, Loss of state cost function corresponding to the reward signal, c is balance coefficient, expect Representation pairs are based on policies The generated sample The average is carried out and the average value, Is a policy parameter.
9. The method of claim 7, wherein the persistent near-end policy optimization objective function is further associated with a parameter constraint penalty corresponding to each active progressive layer; The continuous near-end policy optimization objective function is: ; Wherein alpha (x) is policy learning weight, beta (x) is knowledge retention weight, To tie down the truncated policy penalty for constraining the update amplitude between the old policy and the new policy, A loss is reserved for knowledge used to constrain the degree of knowledge retention between the old and new policies, Loss of state cost function corresponding to the reward signal, c is balance coefficient, expect Representation pairs are based on policies The generated sample The average is carried out and the average value, As a policy parameter, λi is the regularization coefficient of the ith progressive layer, The loss is constrained for the parameters of the i-th progressive layer, Is a progressive layer set that is active.
10. An apparatus for generating an automatic driving vehicle control strategy, comprising: the feature processing module is used for acquiring environment sensing data of the automatic driving vehicle at the current moment, and extracting features of the environment sensing data to obtain state features; The task evaluation module is used for calculating the complexity of the current driving task according to the state characteristics and determining the number of progressive layers to be activated based on the complexity; the strategy generation module is used for respectively inputting the state characteristics into a basic network layer and an activated progressive layer of the progressive neural network, outputting a general strategy through the basic network layer and outputting a specific strategy aiming at a specified complex scene through the progressive layer; The weight adjusting module is used for adjusting knowledge retention weight and strategy learning weight according to the progressive layer number to be activated; and the strategy optimization module is used for carrying out continuous near-end strategy optimization based on the general strategy, the specific strategy, the knowledge reservation weight and the strategy learning weight to obtain the optimal control strategy of the automatic driving vehicle at the current moment.
11. An electronic device, comprising: one or more processors; storage means for storing one or more programs, When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-9.
12. A computer readable medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any of claims 1-9.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.

Description

Method and device for generating control strategy of automatic driving vehicle Technical Field The invention relates to the technical field of automatic driving, in particular to a method and a device for generating an automatic driving vehicle control strategy. Background The automatic driving decision-making module is equivalent to the brain of a human driver, is responsible for generating high-level behavior decisions (such as following a car, changing a lane, converging in and the like) according to perception and prediction information, and provides constraints for the downstream planning control module. Currently, the main decision methods are mainly divided into two categories: (1) A rule-based decision method, which relies on human expert to predefine a large number of logic rules (such as collision time TTC, safety distance threshold, etc.), and outputs decisions through a table lookup or a finite state machine; (2) Deep learning based decision methods map directly from sensor data to decisions based on end-to-end or modular deep neural networks. However, the decision method based on the deep learning usually requires large-scale labeling of data sets to train the deep neural network by offline learning, cannot adapt to the change of task distribution in an open dynamic environment, and easily forgets the knowledge of old tasks while learning new tasks, thus leading to unstable decision performance. Therefore, how to generate an accurate, stable and reliable control strategy for an automatic driving vehicle according to different driving scenes is a problem to be solved at present. Disclosure of Invention In view of this, the embodiment of the invention provides a method and a device for generating an automatic driving vehicle control strategy, which can determine the number of progressive layers to be activated by calculating the complexity of a current driving task and combining the complexity, and combine the network structure of a progressive neural network with a continuous near-end strategy optimization algorithm, so that a model has stronger adaptability when facing tasks with complex and changeable and different scales. In a simple task, the basic network layer may be enough to cope with, by reasonably setting weights, the knowledge retention can be more emphasized, while in a complex task, the progressive layer is activated and the weights are adjusted, so that policy learning can be enhanced, and meanwhile, over-parameterization and forgetting of old knowledge are avoided by utilizing the structural characteristics of the progressive neural network. The method can cope with complex and changeable task scenes with different scales in automatic driving decisions, solves the problem that the conventional decision method is inaccurate or untimely in decision when facing complex intersections, changeable traffic flows and special road conditions, improves the accuracy and adaptability of the decision, reduces the potential traffic accident risk caused by decision errors, improves the safety of automatic driving, and improves the running efficiency and the energy utilization rate. To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a method for generating an automatic driving vehicle control strategy, including: acquiring environment sensing data of an automatic driving vehicle at the current moment, and extracting features of the environment sensing data to obtain state features; Calculating the complexity of the current driving task according to the state characteristics, and determining the number of progressive layers to be activated based on the complexity; Respectively inputting the state characteristics into a basic network layer and an activated progressive layer of the progressive neural network, outputting a general strategy through the basic network layer, and outputting a specific strategy aiming at a specified complex scene through the progressive layer; Adjusting knowledge retention weights and strategy learning weights according to the number of progressive layers to be activated; And performing continuous near-end strategy optimization based on the general strategy, the specific strategy, the knowledge reservation weight and the strategy learning weight to obtain an optimal control strategy of the automatic driving vehicle at the current moment. Optionally, before calculating the complexity of the current driving task according to the state characteristics, the method further comprises calculating the real-time degree of the current driving task according to the state characteristics, and determining that the real-time degree of the current driving task does not meet a preset condition, wherein under the condition that the real-time degree meets the preset condition, a safety control instruction is directly output as an optimal control strategy. The method comprises the steps of obtaining a complexity score rule corr