CN-121981301-A - Model training method and device, scheduling method and device and storage medium

CN121981301ACN 121981301 ACN121981301 ACN 121981301ACN-121981301-A

Abstract

The disclosure relates to a model training method and device, a scheduling method and device and a storage medium, and relates to the technical field of engineering machinery. The model training method comprises the steps of determining state information according to relevant information of a vehicle in a working scene and available information of a working place in the working scene, wherein the available information is used for indicating whether the working place can be used for vehicle working, taking the state information as input, outputting action information of the vehicle by using a machine learning model, updating the state information according to the action information, determining a reward value corresponding to the action information according to the updated state information, and training the machine learning model according to the reward value for vehicle dispatching. The technical scheme of the present disclosure can improve the real-time and accuracy of vehicle dispatching.

Inventors

WANG WANG
LIU WEI
TANG JIANLIN

Assignees

江苏徐工国重实验室科技有限公司
徐州徐工矿业机械有限公司

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (16)

1. A model training method, comprising: Determining state information according to relevant information of vehicles in a working scene and available information of working sites in the working scene, wherein the available information is used for indicating whether the working sites can be used for vehicle working or not; taking the state information as input, and outputting the action information of the vehicle by using a machine learning model; updating the state information according to the action information; Determining a reward value corresponding to the action information according to the updated state information; And training the machine learning model for vehicle dispatching according to the rewarding value.
2. The model training method according to claim 1, wherein the vehicle includes a first vehicle and a second vehicle, the first vehicle is a vehicle to be scheduled, the second vehicle is another vehicle other than the first vehicle in the operation scene, the status information is determined according to related information of the first vehicle and related information of the second vehicle, and the action information includes a target operation site of the first vehicle.
3. The model training method according to claim 2, wherein the related information of the second vehicle includes at least one of current position information of the second vehicle, target position information of the second vehicle, a current operation state of the second vehicle, or a remaining operation duration of the second vehicle.
4. The model training method according to claim 2, wherein the related information of the first vehicle includes at least one of current position information of the first vehicle or loading information of the first vehicle.
5. The model training method according to any one of claims 1 to 4, wherein the reward value is determined according to a job duration of the vehicle to complete the action information, the longer the job duration, the smaller the reward value.
6. The model training method of claim 5, wherein the job duration includes a waiting duration between the vehicle arriving at the loading point and the vehicle beginning to perform a loading job in response to the target job site being the loading point.
7. The model training method of claim 5, wherein the job duration includes a travel duration of the vehicle to the target job site.
8. The model training method according to claim 5, wherein the incentive value is further determined according to a traffic volume of the vehicle in response to the target job site being an unloading point, the larger the ratio of the job duration to the traffic volume, the smaller the incentive value.
9. The model training method of any of claims 1-4, wherein the training the machine learning model in accordance with the reward value comprises: And training the machine learning model until the operation duration of the vehicle for completing the designated traffic volume is less than a threshold value.
10. The model training method of any of claims 1-4, wherein training the machine learning model in accordance with the reward value comprises: and training the machine learning model according to the state information, the action information and the rewards value.
11. A scheduling method, comprising: Scheduling the vehicle using a machine learning model based on information about the vehicle in the job scenario and information available at the job site in the job scenario, the information being indicative of whether the job site is available for vehicle operation, the machine learning model being obtained by the model training method according to any one of claims 1-10.
12. A model training apparatus comprising: the first determining module is used for determining state information according to the related information of the vehicle in the operation scene and the available information of the operation site in the operation scene, wherein the available information is used for indicating whether the operation site can be used for vehicle operation; The output module is used for taking the state information as input and outputting the action information of the vehicle by using a machine learning model; the updating module is used for updating the state information according to the action information; The second determining module is used for determining a reward value corresponding to the action information according to the updated state information; and the training module is used for training the machine learning model according to the rewarding value and scheduling the vehicle.
13. A scheduling apparatus comprising: A scheduling module for scheduling a vehicle in a job scenario using a machine learning model based on information about the vehicle in the job scenario and information available for the job site in the job scenario, the information available for indicating whether the job site can be used for vehicle operation, the machine learning model being obtained by the model training method according to any one of claims 1-10.
14. An electronic device, comprising: memory, and A processor coupled to the memory, the processor configured to perform the model training method of any of claims 1-10 or the scheduling method of claim 11 based on instructions stored in the memory.
15. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the model training method of any of claims 1-10 or the scheduling method of claim 11.
16. A computer program product comprising instructions which, when executed by a processor, cause the processor to perform the model training method of any one of claims 1-10 or the scheduling method of claim 11.

Description

Model training method and device, scheduling method and device and storage medium Technical Field The present disclosure relates to the technical field of engineering machinery, and in particular, to a model training method, a model training device, a scheduling method, a scheduling device, a computer readable storage medium, and a computer program product. Background For closed point-to-point transportation scenarios such as surface mining, the mining truck crew needs to shuttle between a loading point and an unloading point for transportation and loading and unloading tasks. Because of the severe surface mining environment, the manual driving mode faces the challenges of high safety risk, high labor cost and the like. The introduction of the automatic driving technology can not only improve the safety of the transportation process, but also reduce the dependence of mining areas on drivers, thereby reducing the labor cost. In addition, by means of vehicle interconnection and communication control technology, various working vehicles in a mine working scene can transmit and share real-time information with other vehicles, road side equipment and cloud systems, so that collaborative scheduling and intelligent operation are realized. In the related art, vehicles are typically scheduled depending on algorithms (e.g., linear programming, dynamic programming, etc.) built based on real-time job data. Disclosure of Invention The inventors of the present disclosure found that the above related art has a problem in that the solution time of the above algorithm is long, resulting in low real-time of vehicle scheduling. In view of this, the present disclosure proposes a model training technical solution, which can directly output a scheduling decision of a vehicle based on available information of a working location in a working scene by using a machine learning model, so as to improve real-time performance and accuracy of vehicle scheduling. According to some embodiments of the present disclosure, a model training method is provided, which includes determining status information according to information related to a vehicle in a work scene and information available to a work place in the work scene, the information available to indicate whether the work place can be used for a vehicle work, using the status information as input, outputting motion information of the vehicle by using a machine learning model, updating the status information according to the motion information, determining a reward value corresponding to the motion information according to the updated status information, and training the machine learning model according to the reward value for vehicle scheduling. In some embodiments, the vehicles include a first vehicle and a second vehicle, the first vehicle is a vehicle to be scheduled, the second vehicle is another vehicle except the first vehicle in the operation scene, the state information is determined according to related information of the first vehicle and related information of the second vehicle, and the action information includes a target operation site of the first vehicle. In some embodiments, the relevant information of the second vehicle includes at least one of current location information of the second vehicle, target location information of the second vehicle, current operating state of the second vehicle, or remaining operating duration of the second vehicle. In some embodiments, the related information of the first vehicle includes at least one of current location information of the first vehicle, or loading information of the first vehicle. In some embodiments, the benefit value is determined based on a length of time the vehicle has completed operation information, the longer the length of time the benefit value is smaller. In some embodiments, in response to the target job location being a loading location, the job duration includes a waiting duration between the vehicle reaching the loading location and the vehicle beginning to execute the loading job. In some embodiments, the length of the job includes a length of travel of the vehicle to the target job site. In some embodiments, in response to the target job site being an unloading point, the prize value is also determined from the traffic volume of the vehicle, the greater the ratio of the job duration to the traffic volume, the lesser the prize value. In some embodiments, the machine learning model is trained until the length of time the vehicle has completed a specified traffic volume is less than a threshold. In some embodiments, a machine learning model is trained based on the state information, the motion information, and the reward values. According to other embodiments of the present disclosure, there is provided a scheduling method including scheduling a vehicle using a machine learning model based on information about the vehicle in a job scene and information available at a job site in the job scene, the informati