CN-122021330-A - Unmanned trolley control training acceleration method for single time scale mixed TD algorithm

CN122021330ACN 122021330 ACN122021330 ACN 122021330ACN-122021330-A

Abstract

The invention discloses an unmanned trolley control training acceleration method based on a single time scale mixed TD algorithm, which is suitable for driving control training of unmanned trolleys in mountain areas. The method comprises the steps of firstly establishing a reinforcement learning environment model aiming at a climbing control task, acquiring mountain terrain information, trolley position and speed information and end point marks through gradient, position and speed sensors, inputting trolley states and executable actions into a neural network, converting the trolley states and the executable actions into feature vectors through a tile encoder, calculating action Q values by combining a linear method, selecting actions by adopting an epsilon-greedy strategy, acquiring rewards and the next states after the actions are executed, synchronously updating main and auxiliary parameters through a single time scale hybrid TD algorithm, and performing iterative training until preset rounds are completed. The invention avoids the numerical oscillation of asynchronous parameter updating, accelerates the training convergence speed, improves the stability and accuracy of the optimal strategy, effectively solves the problem of low training efficiency of the mountain unmanned trolley control, and has good engineering application value.

Inventors

CHEN XINGGUO
SHEN YUCHEN

Assignees

南京邮电大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (9)

1. An unmanned trolley control training acceleration method based on a single time scale mixed TD algorithm is used for accelerating a training process of unmanned trolley driving control in mountain areas and is characterized by comprising the following steps of: s1, aiming at a climbing control task of a mountain unmanned trolley, establishing a reinforcement learning environment model, and instantiating a trained neural network model; s2, acquiring the current mountain terrain environment information, the position information and the speed information of the trolley by using a gradient sensor, a position sensor and a speed sensor, wherein the information comprises a mountain gradient function, a position boundary constraint, a maximum speed constraint, a minimum speed constraint and a trolley Time-of-day position coordinates Trolley Time of day speed Whether or not to reach the end point A logo; s3, trolley Status information of time of day And optional actions Inputting the neural network model, and converting into corresponding characteristics Combined with linear methods Calculating the value of each executable action A value, wherein, Including the position coordinates of the trolley And speed of small vehicle , Is in a state of unmanned vehicle A certain optional action to be taken down, For the action index to be used, To take action in the current state Is characterized in that, As a characteristic weight parameter, the weight of the characteristic, To transpose the symbols to the vectors, use Method selection actions And execute, record the current characteristics ; S4, the trolley executes actions After entering the next state Obtain rewards And Sign, calculating by adopting the method of step S3 The following actions Obtaining Action corresponding to maximum value And record Reuse is carried out Method selection actions And record ; S5, for obtaining data through one-time sampling, updating parameters of an algorithm by using an updating formula based on a single time scale mixed TD algorithm, and optimizing a current control strategy; s6, repeating the steps S2-S5 until the trolley reaches a target position or reaches the maximum training step number, and completing one round of training; S7, after each round of training is finished, updating And (3) repeating the training for multiple rounds until the preset training rounds are completed, so that the trolley learns the optimal strategy of mountain automatic driving.
2. The method for accelerating the control training of the unmanned aerial vehicle based on the single time scale mixed TD algorithm according to claim 1, wherein in step S3, the neural network model uses a tile encoder to train the aerial vehicle Status information of time of day And optional actions Conversion to feature vectors 。
3. The method for accelerating the control training of the unmanned vehicle based on the single time scale mixed TD algorithm according to claim 1, wherein in step S3, said method comprises the steps of The selection logic of the method is as follows: Is a number between 0 and 1, is initially set and updated after each round of training is finished, and randomly generates a random number between 0 and 1, if the random number is smaller than Randomly selecting one action from all executable actions as Otherwise, selecting all executable actions Maximum action as And record the current characteristics 。
4. The method for accelerating the control training of the unmanned aerial vehicle based on the single time scale mixed TD algorithm according to claim 1, wherein in step S4, if Is that Then awards 100, Otherwise rewarding Is-1.
5. The method for accelerating the control training of an unmanned vehicle based on a single time scale hybrid TD algorithm according to claim 1, where in step S5, said parameters include main parameters Auxiliary parameters The initial formula for parameter update is as follows: Wherein: As a discount coefficient, the number of the discount coefficients, Is the learning rate.
6. The method for accelerating the control training of the unmanned trolley based on the single time scale mixed TD algorithm according to claim 5, wherein after substituting A, b and M into an initial formula, a final updated formula is obtained: 。
7. The method for accelerating the control training of the unmanned trolley based on the single time scale mixed TD algorithm according to claim 1, wherein in step S7, the training of each round is updated after the completion of the training The value of (2) is updated by Multiplied by an attenuation coefficient of less than 1.
8. The method for accelerating the control training of the unmanned trolley based on the single time scale mixed TD algorithm according to claim 7, wherein the maximum training step number is preset to 1000 steps in the step S6, and the attenuation coefficient is 0.9992 in the step S7.
9. The method for accelerating the control training of the unmanned trolley based on the single time scale mixed TD algorithm according to claim 1, wherein the unmanned trolley comprises a mountain inspection and operation unmanned trolley and a mountain transportation unmanned trolley.

Description

Unmanned trolley control training acceleration method for single time scale mixed TD algorithm Technical Field The invention relates to the technical field of industrial intelligent equipment, in particular to an unmanned trolley control training acceleration method of a single time scale mixed TD algorithm, which is suitable for autonomous running control and strategy optimization of unmanned patrol vehicles in complex mountain terrain environments. Background With the wide deployment of power lines, communication base stations, oil and gas pipelines and other infrastructures in mountainous areas and hilly areas, mountain inspection and operation maintenance unmanned vehicles play an increasingly important role in the tasks of infrastructure inspection, daily operation maintenance and the like. In a complex mountain environment, unmanned inspection vehicles face various challenges, such as factors of large topography fluctuation, frequent gradient change, complex road conditions and the like, which not only increase the risk of unmanned vehicle driving control, but also limit the inspection efficiency in the mountain environment. To overcome the above problems, researchers have been working on developing unmanned vehicle automatic driving control algorithms that adapt to mountain terrain. At present, the existing driving control training method of the unmanned trolley in mountain terrains has the problems of low training speed, poor convergence stability and the like, so that the unmanned trolley is difficult to quickly obtain an optimal driving strategy, and the operation efficiency and reliability of the unmanned trolley in complex mountain environments are affected. Therefore, there is a need for an unmanned vehicle control training acceleration method of a single time scale hybrid TD algorithm that can accelerate the training speed and improve the convergence stability. Disclosure of Invention The invention aims to provide an unmanned trolley control training acceleration method of a single time scale mixed TD algorithm, which aims to solve the problems of low training speed, poor convergence stability and the like in the prior art, so that the unmanned trolley can learn to an optimal control strategy more quickly, and the accuracy and the efficiency of driving control are improved. In order to achieve the above purpose, the technical scheme of the invention is realized as follows: An unmanned trolley control training acceleration method based on a single time scale mixed TD algorithm is used for accelerating a training process of unmanned trolley driving control in mountain areas and comprises the following steps of: s1, aiming at a climbing control task of a mountain unmanned trolley, establishing a reinforcement learning environment model, and instantiating a trained neural network model; s2, acquiring the current mountain terrain environment information, the position information and the speed information of the trolley by using a gradient sensor, a position sensor and a speed sensor, wherein the information comprises a mountain gradient function, a position boundary constraint, a maximum speed constraint, a minimum speed constraint and a trolley Time-of-day position coordinatesTrolleyTime of day speedWhether or not to reach the end pointA logo; s3, trolley Status information of time of dayAnd optional actionsInputting the neural network model, and converting into corresponding characteristicsCombined with linear methodsCalculating the value of each executable actionA value, wherein,Including the position coordinates of the trolleyAnd speed of small vehicle,Is in a state of unmanned vehicleA certain optional action to be taken down,For the action index to be used,To take action in the current stateIs characterized in that,As a characteristic weight parameter, the weight of the characteristic,To transpose the symbols to the vectors, useMethod selection actionsAnd execute, record the current characteristics; S4, the trolley executes actionsAfter entering the next stateObtain rewardsAndSign, calculating by adopting the method of step S3The following actionsObtainingAction corresponding to maximum valueAnd recordReuse is carried outMethod selection actionsAnd record; S5, for obtaining data through one-time sampling, updating parameters of an algorithm by using an updating formula based on a single time scale mixed TD algorithm, and optimizing a current control strategy; s6, repeating the steps S2-S5 until the trolley reaches a target position or reaches the maximum training step number, and completing one round of training; S7, after each round of training is finished, updating And (3) repeating the training for multiple rounds until the preset training rounds are completed, so that the trolley learns the optimal strategy of mountain automatic driving. Further, in step S3, the neural network model uses a tile encoder to drive the vehicleStatus information of time of dayAnd optional actionsConversion to