CN-115292154-B - Security scene acceleration test method and system based on countermeasure reinforcement learning

CN115292154BCN 115292154 BCN115292154 BCN 115292154BCN-115292154-B

Abstract

The invention relates to a safety scene acceleration test method and system based on countermeasure reinforcement learning, wherein the method comprises the steps of obtaining an automatic driving tested object and a preset test case library, designing an acquisition function capable of being evenly developed and explored based on a proxy simulation model of machine learning, searching the test case library according to the acquisition function based on the automatic driving tested object to obtain a test set suitable for the automatic driving tested object, wherein the test set comprises a plurality of driving scenes, testing according to the driving scenes and the automatic driving tested object based on the countermeasure learning method to obtain track distribution, and timely adjusting the test set according to the track distribution to realize acceleration test of driving scenes with different safety. The invention can improve the safety degree and the test efficiency of the test of the automatic driving vehicle.

Inventors

DING YANCHAO
LI RU
MA YULIN
WANG GUANGWEI

Assignees

苏州观瑞汽车技术有限公司

Dates

Publication Date: 20260508
Application Date: 20220524

Claims (6)

1. A security scene acceleration test method based on countermeasure reinforcement learning is characterized by comprising the following steps: Acquiring an automatic driving tested object and a preset test case library; The method comprises the steps of obtaining simulation test scene data, training a preset multi-layer perceptron neural network by using the simulation test scene data as training data to obtain a trained test case matching proxy model, iteratively updating network parameters of the trained test case matching proxy model by using a hyper-parameter configuration function to obtain a test case matching proxy model with optimal generalization capability, wherein the hyper-parameter configuration function corresponding to the test case matching proxy model with optimal generalization capability is used as the acquisition function, and the network parameters comprise the number of hidden layers, the number of hidden layer neurons, the learning rate, the index attenuation rate and the batch size of training data; searching the test case library according to the acquisition function based on the automatic driving tested object to obtain a test set suitable for the automatic driving tested object, wherein the test set comprises a plurality of driving scenes; Based on the method of countermeasure learning, testing is carried out according to the driving scene and the automatic driving tested object to obtain track distribution; The method comprises the steps of timely adjusting a test set according to track distribution to achieve acceleration tests of driving scenes with different safety, timely adjusting the test set according to the track distribution to achieve the acceleration tests of the driving scenes with different safety, establishing unbiased estimation of the performance of an automatic driving system to be tested in a natural driving environment based on an importance sampling theory, obtaining importance sampling weights according to probability distribution of the track distribution, and improving occurrence probability of risk events in the test set according to the importance sampling weights and the unbiased estimation in the test process to achieve high-confidence acceleration tests.
2. The method for accelerating the test of the safety scene based on the reinforcement learning of the countermeasure type according to claim 1, wherein the method for determining the test case library is specifically as follows: Dividing scene parameters into environment parameters, road condition parameters and object parameters based on a preset automatic driving test target and a universal test scene data format, wherein the test target comprises a design operation domain, a dynamic driving task and a traffic rule; determining a parameter constraint set according to requirements in a test scene, and acquiring an uncovered combination set in the test scene; All the scene parameters are valued to obtain a scene parameter combination set; performing parameter value taking on the scene parameter combination set according to the parameter constraint set to obtain a parameter value combination set; Repeatedly iterating the value process to analyze a parameter space, wherein the parameter space comprises the uncovered combined set; And generating a plurality of test cases according to the parameter space based on a combined test algorithm, wherein the test case library consists of a plurality of test cases.
3. The method for acceleration testing of security scenes based on reinforcement learning of countermeasure type according to claim 2, wherein the determining a parameter constraint set according to the requirements in the testing scene includes: converting requirements in the test scenario into a plurality of constraints; obtaining an initial constraint set according to the constraint; Simplifying the initial constraint set to obtain a simplest constraint set; Searching for implicit constraints in the simplest constraint set, and adding the found implicit constraints to the initial constraint set to obtain the final parameter constraint set.
4. The method for accelerating the test of the safety scene based on the countermeasure reinforcement learning according to claim 1, wherein the searching the test case library according to the collection function based on the autopilot object to obtain the test set suitable for the autopilot object to be tested comprises: based on a super-parameter optimization algorithm, obtaining optimal super-parameter configuration according to the acquisition function; Determining the searching direction of the input sample point according to the optimal super-parameter configuration; and searching the test case library according to the input sample point searching direction to obtain a test set of the automatic driving tested object.
5. The method for acceleration testing of safety scenes based on reinforcement learning of countermeasure type according to claim 1, wherein the method for acceleration testing of safety scenes based on countermeasure learning according to the driving scene and the automatic driving object to be tested, obtaining a track distribution, comprises: determining each Nash equilibrium point through reinforcement learning interactive countermeasure training, wherein each Nash equilibrium point corresponds to different types of driving scenes; determining a new Nash equilibrium point through countermeasure reinforcement learning; And testing the test cases of the automatic driving tested object subjected to modification and evolution according to the new Nash equilibrium points, and generating the track distribution generated based on countermeasure learning.
6. A safety scene acceleration test system based on countermeasure reinforcement learning is characterized by comprising: the acquisition module is used for acquiring an automatic driving tested object and a preset test case library; The design module is used for designing an acquisition function which can be uniformly developed and explored based on a machine learning proxy simulation model; the agent simulation model based on machine learning is designed to uniformly develop and explore an acquisition function, and comprises the steps of acquiring simulation test scene data, training a preset multi-layer perceptron neural network by using the simulation test scene data as training data to obtain a trained test case matching agent model, iteratively updating network parameters of the trained test case matching agent model by using a hyper-parameter configuration function to obtain the test case matching agent model with optimal generalization capability, wherein the hyper-parameter configuration function corresponding to the test case matching agent model with optimal generalization capability is the acquisition function, and the network parameters comprise the number of hidden layers, the number of hidden layer neurons, the learning rate, the index attenuation rate and the batch size of training data; the search module is used for searching the test case library according to the acquisition function based on the automatic driving tested object to obtain a test set suitable for the automatic driving tested object, wherein the test set comprises a plurality of driving scenes; the test module is used for testing according to the driving scene and the automatic driving tested object based on the countermeasure learning method to obtain track distribution; The system comprises an adjustment module, a test set and a test set, wherein the adjustment module is used for timely adjusting the test set according to the track distribution to realize acceleration tests of driving scenes with different safety, the adjustment module is used for timely adjusting the test set according to the track distribution to realize the acceleration tests of the driving scenes with different safety, the adjustment module is used for establishing unbiased estimation of the performance of an automatic driving system to be tested in a natural driving environment based on an importance sampling theory, obtaining importance sampling weights according to probability distribution of the track distribution, and improving occurrence probability of risk events in the test set according to the importance sampling weights and the unbiased estimation in the test process to realize high-confidence acceleration tests.

Description

Security scene acceleration test method and system based on countermeasure reinforcement learning Technical Field The invention relates to the technical field of automatic driving test of vehicles, in particular to a safety scene acceleration test method and system based on countermeasure reinforcement learning. Background In recent years, various types of automatic driving simulation test software are endless, especially, many internet companies in the advanced automatic driving industry, each develop simulation software to support development of automatic driving technology, such as Baidu Apollo, tencentrated TAD Sim, microsoft Airsim, waymo CarCraft, toyota/Intel CARLAO, siemens PreScan, etc. The various simulation software has various functions and different using methods. The existing automatic driving simulation test platform is low in test efficiency, low in confidence coefficient, high in vehicle running environment complexity and high in randomness, the existing test scene is difficult to reflect a real traffic environment with high complexity and high randomness, a multi-level and multi-difficulty-level high-confidence scene construction method of an automatic driving system cannot be formed, a scene quick search method corresponding to a test target is lacked, the test efficiency is low, the confidence coefficient is low, and a high-efficiency high-confidence scene construction theory and an acceleration test method which are suitable for different level requirements need to be constructed. The method for realizing the high-efficiency matching of the test object and the test case and forming the personalized test method meeting the characteristics of the heterogeneous tested object is the core for realizing the acceleration test. Reinforcement learning has been used in scene acceleration testing, but current common learning does not address the high safety issues required of autonomous vehicles. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide a security scene acceleration test method and system based on countermeasure reinforcement learning. In order to achieve the above object, the present invention provides the following solutions: a security scene acceleration test method based on countermeasure reinforcement learning comprises the following steps: Acquiring an automatic driving tested object and a preset test case library; based on a machine learning proxy simulation model, designing acquisition functions which can be evenly developed and explored; searching the test case library according to the acquisition function based on the automatic driving tested object to obtain a test set suitable for the automatic driving tested object, wherein the test set comprises a plurality of driving scenes; Based on the method of countermeasure learning, testing is carried out according to the driving scene and the automatic driving tested object to obtain track distribution; And timely adjusting the test set according to the track distribution so as to realize the acceleration test of driving scenes with different safety. Preferably, the method for determining the test case library specifically includes: Dividing scene parameters into environment parameters, road condition parameters and object parameters based on a preset automatic driving test target and a universal test scene data format, wherein the test target comprises a design operation domain, a dynamic driving task and a traffic rule; determining a parameter constraint set according to requirements in a test scene, and acquiring an uncovered combination set in the test scene; All the scene parameters are valued to obtain a scene parameter combination set; performing parameter value taking on the scene parameter combination set according to the parameter constraint set to obtain a parameter value combination set; Repeatedly iterating the value process to analyze a parameter space, wherein the parameter space comprises the uncovered combined set; And generating a plurality of test cases according to the parameter space based on a combined test algorithm, wherein the test case library consists of a plurality of test cases. Preferably, the determining the parameter constraint set according to the requirements in the test scene includes: converting requirements in the test scenario into a plurality of constraints; obtaining an initial constraint set according to the constraint; Simplifying the initial constraint set to obtain a simplest constraint set; Searching for implicit constraints in the simplest constraint set, and adding the found implicit constraints to the initial constraint set to obtain the final parameter constraint set. Preferably, the machine learning-based proxy simulation model designs acquisition functions that can be evenly developed and explored, including: Acquiring simulation test scene data; Training a preset multi-layer perceptron neural network by using the simulation test