CN-122024480-A - Traffic signal control optimization method and device and electronic equipment
Abstract
The invention provides a traffic signal control optimization method, a device and electronic equipment, the method comprises the steps of obtaining multisource traffic data, exploration rate and current parameter vectors corresponding to an optimization level in a current time step, determining a large language model or a corresponding reinforcement learning model as an action source according to the exploration rate, determining a current action according to a current state and the action source, determining an adjusted parameter vector based on the current action and the current parameter vector, determining green time of a target phase based on the adjusted parameter vector, controlling traffic signals of the optimization level to obtain new multisource traffic data, determining the adjusted parameter vector as new current parameter vector, continuously controlling traffic signals of the optimization level, storing control experience data of the optimization level in a preset first period, optimizing the reinforcement learning model corresponding to the optimization level, and solving a black box decision problem by adjusting parameters with physical meanings.
Inventors
- PI JIATIAN
- YANG XINMIN
- WANG YUXUAN
Assignees
- 重庆智路云行科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260212
Claims (10)
- 1. A traffic signal control optimization method, the method comprising: The method comprises the steps of obtaining multi-source traffic data, exploration rate and current parameter vector corresponding to an optimization level in a current time step, wherein the optimization level comprises single-intersection optimization or regional optimization consisting of a plurality of single intersections, and the parameter vector comprises a plurality of parameters which are used for indirectly controlling traffic signals and have physical meanings; Determining a large language model or a corresponding reinforcement learning model as an action source according to the exploration rate, and determining a current action according to a current state and the action source, wherein the current state is constructed based on the multi-source traffic data, and the current action is used for representing the adjustment quantity of the current parameter vector; Determining an adjusted parameter vector based on the current action and the current parameter vector, determining the green time of a target phase based on the adjusted parameter vector so as to control the traffic signal of the optimized level, obtaining new multi-source traffic data, determining the adjusted parameter vector as a new current parameter vector, and continuously controlling the traffic signal of the optimized level; and storing control experience data of the optimization level in a preset first period, and optimizing a reinforcement learning model corresponding to the optimization level, wherein the control experience data comprises the current state, the current action and a next state vector after the adjusted parameter vector is executed.
- 2. The traffic signal control optimization method according to claim 1, wherein the determination of the optimization hierarchy includes: if the parameter performance index of a single intersection is improved within a preset second period, determining the single intersection as an active intersection, and determining an optimization level as single intersection optimization, wherein the parameter performance index is obtained based on the intersection instant rewarding of the single intersection; if the parameter performance index of an active intersection is not improved within a preset second period, determining the active intersection as a stable intersection; If adjacent stable intersections exist and the number of the stable intersections is larger than a preset number threshold, determining that the optimization level is area optimization, and determining the adjacent stable intersections as an optimization area.
- 3. The traffic signal control optimization method according to claim 1, further comprising, before determining a green time of a target phase based on the adjusted parameter vector: constructing a region joint state according to the multi-source traffic data of each single intersection, and inputting the state into an upper-layer strategy to obtain a coordination mode of each single intersection, wherein the upper-layer strategy takes region joint rewards for maximizing the region optimization as an optimization target; if the value type of the coordination mode of a single intersection is a continuous value, scaling the current action; if the value type of the coordination mode of a single intersection is a discrete value, freezing or allowing an adjustment strategy, wherein the adjustment strategy comprises optimizing a bottom layer strategy corresponding to the single intersection and/or adjusting a parameter vector corresponding to the single intersection so as to obtain a single intersection optimization range; The upper layer strategy and the bottom layer strategy of the single intersection are obtained based on the corresponding reinforcement learning model.
- 4. The traffic signal control optimization method of claim 1, wherein the parameter vector comprises a historical gain coefficient for characterizing a control historical data fusion weight, a base bias time for characterizing a vehicle launch loss time, a base transit time for characterizing a time required to pass a unit vehicle, and a neighbor gain coefficient for characterizing a control neighbor prediction information fusion weight.
- 5. The traffic signal control optimization method according to claim 4, wherein the problem pattern and the corresponding adjustment logic are associated in the large language model to obtain a current action at an initial stage of exploration; if the problem mode is that a green light is empty, reducing the value of a first type of parameter and/or detecting the quality of multi-source traffic data, wherein the first type of parameter comprises at least one of the basic passing time, the basic bias time and the historical gain coefficient; If the problem mode is that the vehicle is not completely released, increasing the value of a second type of parameters and/or detecting the abrupt change state of traffic demands, wherein the second type of parameters comprise at least one of the basic passing time, the basic bias time and the historical gain coefficient; If the problem mode is unbalanced data in different directions, detecting a fault state of corresponding data acquisition equipment and/or detecting the correctness of road configuration; If the problem mode is that control lag exists, reducing the value of a third type of parameter and/or adjusting a time prediction parameter corresponding to a model, wherein the third type of parameter comprises the historical gain coefficient and/or the queue length of the historical data, and the time prediction parameter comprises a decision time step corresponding to the current action and/or a model updating frequency corresponding to the reinforcement learning model.
- 6. The traffic signal control optimization method according to claim 4, wherein determining a green time of a target phase based on the adjusted parameter vector comprises: determining a phase with the maximum vehicle queuing pressure as a target phase or each phase of the optimization hierarchy as a target phase, wherein the vehicle queuing pressure is used for representing the total number of target queuing lengths of lanes managed by the phase, and the target queuing lengths are determined based on the real-time queuing lengths of vehicles in the lanes, the adjusted neighbor gain coefficients and the adjusted historical gain coefficients; Determining the green light time of the target phase according to the adjusted basic bias time, the adjusted basic passing time and the maximum value of the target queuing length of the lane managed by the target phase; The adjusted parameter vector includes the adjusted historical gain coefficient, the adjusted basic bias time, the adjusted basic transit time, and the adjusted neighbor gain coefficient.
- 7. The traffic signal control optimization method according to claim 6, the target queuing length being determined based on a real-time queuing length of vehicles in the lane, an adjusted neighbor gain coefficient, and an adjusted historical gain coefficient, comprising: Determining an effective queuing length based on the real-time queuing length, the historical queuing length and the adjusted historical gain coefficient of the vehicles in the lane; correcting the effective queuing length according to the predicted queuing length and the adjusted neighbor gain coefficient to obtain a target queuing length; the predicted queuing length is obtained based on a pre-trained multi-intersection queuing length prediction model and historical queuing lengths corresponding to the single-intersection.
- 8. The traffic signal control optimization method according to any one of claims 1-7, wherein the zone-optimized zone joint rewards comprise: and determining the regional joint rewards based on the mean value and the variance of the instant rewards of all the single intersections in the optimized region and the penalty coefficients corresponding to the variance.
- 9. A traffic signal control optimizing device, the device comprising: the system comprises an acquisition module, a search module and a search module, wherein the acquisition module is used for acquiring multi-source traffic data, an exploration rate and a current parameter vector corresponding to an optimization level in a current time step, the optimization level comprises single-intersection optimization or regional optimization consisting of a plurality of single intersections, and the parameter vector comprises a plurality of parameters which are used for indirectly controlling traffic signals and have physical meanings; the action generating module is used for determining a large language model or a corresponding reinforcement learning model as an action source according to the exploration rate, determining a current action according to a current state and the action source, wherein the current state is constructed based on the multi-source traffic data, and the current action is used for representing the adjustment quantity of the current parameter vector; The traffic signal control module is used for determining an adjusted parameter vector based on the current action and the current parameter vector, determining the green time of a target phase based on the adjusted parameter vector so as to control the traffic signal of the optimized level, obtaining new multi-source traffic data, determining the adjusted parameter vector as a new current parameter vector, and continuously controlling the traffic signal of the optimized level; The model optimization module is used for storing control experience data of the optimization level in a preset first period of time and optimizing a reinforcement learning model corresponding to the optimization level, wherein the control experience data comprises the current state, the current action and a next state vector after the adjusted parameter vector is executed.
- 10. An electronic device, the electronic device comprising: one or more processors; Storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the traffic signal control optimization method of any one of claims 1-8.
Description
Traffic signal control optimization method and device and electronic equipment Technical Field The present invention relates to the field of traffic control technologies, and in particular, to a traffic signal control optimization method, a device, and an electronic device. Background With the acceleration of the urban process and the continuous increase of the maintenance quantity of motor vehicles, the urban traffic system faces serious congestion challenges, and the traffic signal control is used as a key means for improving the traffic efficiency of the road network, and the self-adaptive optimization capability of the urban traffic system is the research core of the intelligent traffic system (INTELLIGENT TRANSPORTATION SYSTEM, ITS). The traditional control mode such as timing control and induction control has the advantage of simple structure, but often has the limitation of insufficient dynamic response lag and cross-region coordination capability when dealing with complex time-varying flow and emergency. In the related art, deep reinforcement learning (Deep Reinforcement Learning, DRL) shows strong perception and decision potential through interaction trial and error of an agent and an environment. However, the end-to-end data driving method faces serious challenges in practical engineering application from simulation to reality (Simulation to Reality, sim2 Real), including problems of lack of interpretability of the black box model, easiness in generating extreme actions against common sense of transportation in the exploration stage, and low training efficiency caused by difficulty in merging the experience of mature traffic engineering specialists. In addition, large language models (Large Language Model, LLM) are able to handle unstructured expert rules with excellent semantic understanding and common sense reasoning capabilities, but with high reasoning delays and potential "hallucination" risks in real-time closed-loop control. Disclosure of Invention The invention provides a traffic signal control optimization method, a traffic signal control optimization device and electronic equipment, and aims to solve the technical problems that in the traffic signal control process, the training efficiency is low in deep reinforcement learning, and the reasoning delay and the potential illusion exist in a large language model. The invention provides a traffic signal control optimization method which comprises the steps of obtaining multisource traffic data, an exploration rate and a current parameter vector corresponding to an optimization level in a current time step, wherein the optimization level comprises single-intersection optimization or regional optimization composed of a plurality of single-intersections, the parameter vector comprises a plurality of parameters used for indirectly controlling traffic signals and having physical meanings, a large language model or a corresponding reinforcement learning model is determined as an action source according to the exploration rate, a current action is determined according to a current state and the action source, the current state is constructed based on the multisource traffic data, the current action is used for representing the adjustment quantity of the current parameter vector, an adjusted parameter vector is determined based on the current action and the current parameter vector, green light time of a target phase is determined based on the adjusted parameter vector, so that traffic signals of the optimization level are controlled, new multisource traffic data are obtained, the adjusted parameter vector is determined to be new current parameter vector, traffic signals of the optimization level are continuously controlled, the optimized control data are stored in a preset first period, the current state is optimized, and the adjusted parameter vector comprises the current state and the reinforcement learning model is performed under the current state. In an embodiment of the invention, the determination of the optimization level comprises the steps of determining a single intersection as an active intersection and determining that the optimization level is optimized for the single intersection if the parameter performance index of the single intersection is improved within a preset second period, determining the active intersection as a stable intersection if the parameter performance index of the active intersection is not improved within the preset second period, determining the optimization level as an area optimization if the parameter performance index of the single intersection is improved within the preset second period, and determining the adjacent stable intersection as an optimization area if the number of the stable intersections is larger than a preset number threshold. In an embodiment of the invention, before the green time of the target phase is determined based on the adjusted parameter vector, the method further comprises the step