CN-121998532-A - Boxing method, training method of sequence generation model and electronic equipment

CN121998532ACN 121998532 ACN121998532 ACN 121998532ACN-121998532-A

Abstract

The invention provides a boxing method, training method of a sequence generation model and electronic equipment, which comprise the steps of obtaining attribute information of a plurality of objects to be loaded, gradually determining all objects in a predicted loading sequence through the sequence generation model based on the attribute information of the plurality of objects to be loaded, obtaining a current loading sequence after determining a kth object, wherein k is a positive integer smaller than the number of the plurality of objects to be loaded, determining an intermediate evaluation signal based on the current loading sequence, indicating the volume utilization rate of the current loading sequence and determining volume penalty based on the relationship between the volume and the weight of the current loading sequence, adjusting a generation strategy of the sequence generation model based on the intermediate evaluation signal to improve the volume utilization rate and reduce the volume penalty in a subsequent sequence generation process, and continuously determining the (k+1) th object in the predicted loading sequence through the adjusted generation strategy based on the sequence generation model until a complete predicted loading sequence is obtained.

Inventors

Qu Pengzhan

Assignees

联想(北京)有限公司

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (10)

1. A method of boxing comprising: acquiring attribute information of a plurality of objects to be loaded, wherein the attribute information comprises volume information and weight information; Gradually determining each object in the predicted loading sequence through a sequence generation model based on the attribute information of the plurality of objects to be loaded, and obtaining a current loading sequence after determining a kth object, wherein k is a positive integer smaller than the number of the plurality of objects to be loaded; Determining an intermediate evaluation signal based on the current loading sequence, the intermediate evaluation signal indicating a volumetric utilization of the current loading sequence, and a volumetric penalty determined based on a volumetric and weight relationship of the current loading sequence; Adjusting a generation strategy of the sequence generation model based on the intermediate evaluation signal to improve the volume utilization rate and reduce the volume penalty in a subsequent sequence generation process; And continuously determining the (k+1) th object in the predicted loading sequence through the adjusted generation strategy based on the sequence generation model until the complete predicted loading sequence is obtained.
2. The method of claim 1, the adjusting a generation strategy of the sequence generation model based on the intermediate evaluation signal, comprising: reducing the consistency of sampling in the sequence generation model in the subsequent sequence generation process under the condition that the volume utilization rate is smaller than a first threshold value and the volume penalty is larger than or equal to a second threshold value; When the volume utilization rate is greater than or equal to a third threshold value and the volume penalty is smaller than a fourth threshold value, the consistency of sampling is improved in the subsequent sequence generation process of the sequence generation model; Wherein the first threshold is less than or equal to the third threshold and the second threshold is greater than or equal to the fourth threshold.
3. The method of claim 1, the determining an intermediate evaluation signal based on the current loading sequence, comprising: Performing boxing simulation based on the current loading sequence, and determining the number of loading units consumed by the current loading sequence in a simulated boxing state, and the loading weight and loading volume of each loading unit; Determining the volume utilization rate based on the number of consumed loading units and volume information of objects in the current loading sequence; the volume penalty is determined based on the loading weight and loading volume of each loading unit.
4. A training method of a sequence generation model, comprising: acquiring training data, wherein the training data comprises attribute information of a plurality of objects to be loaded and corresponding target loading sequences; Gradually determining each object in the predicted loading sequence through a sequence generation model based on the attribute information, and obtaining a current loading sequence after determining a kth object, wherein k is a positive integer smaller than the number of the plurality of objects to be loaded; Determining an intermediate rewards signal based on the current loading sequence and the target loading sequence, the intermediate rewards signal characterizing a consistency of the current loading sequence and the target loading sequence; And adjusting parameters of the sequence generation model based on the intermediate reward signal to improve the consistency of a predicted loading sequence generated subsequently and the target loading sequence, wherein the adjusted sequence generation model is used for continuously determining the (k+1) th object in the predicted loading sequence.
5. The method of claim 4, the determining an intermediate bonus signal based on the current loading sequence and the target loading sequence, comprising: Extracting objects contained together with the current loading sequence from the target loading sequence, and keeping the relative sequence of the objects in the target loading sequence to obtain a target comparison sequence; the jackpot signal is determined based on a degree of difference in relative order between the current loading sequence and the target comparison sequence.
6. The method of claim 5, the determining the jackpot signal based on a degree of difference in relative order between the current loading sequence and the target comparison sequence, comprising: Determining at least one base sequence distance based on the current loading sequence and the target comparison sequence, the base sequence distance characterizing a degree of difference in relative order of objects in the current loading sequence and the target comparison sequence; obtaining a current sequence distance based on at least one base sequence distance; the jackpot signal is determined based on the current sequence distance.
7. The method of claim 6, the at least one base sequence distance comprising: a first sequence distance determined based on a number of pairs of objects in the current loading sequence and the target comparison sequence that are not in consistent relative order; a second sequence distance determined based on a sum of absolute values of each object position difference in the current loading sequence and the target comparison sequence; A third sequence distance, which is determined by calculating the number of operations required to make the current loading sequence and the target comparison sequence consistent based on a measurement of the relative order of the objects in the sequences; A fourth sequence distance determined based on a sum of squares of each object position difference in the current loading sequence and the target comparison sequence; A fifth sequence distance is determined based on the length of the longest common subsequence of the current loading sequence and the target comparison sequence.
8. The method of claim 4, after completing iterative adjustments to the sequence generation model resulting in a complete predicted loading sequence, the method further comprising: carrying out boxing simulation based on the complete predicted loading sequence to obtain a simulated boxing result; Determining a final reward signal based on the simulated boxing result; Based on the final reward signal and each intermediate reward signal obtained in the process of generating the predicted loading sequence, the parameters of the sequence generation model are adjusted, and the method specifically comprises the following steps: Based on the intermediate reward signal and the final reward signal, evaluating the long-term value of each step in the sequence generation process to obtain a value evaluation result; Calculating a strategy gradient of an output result of each step of the sequence generation model in the sequence generation process; scaling the strategy gradient based on the value evaluation result, and adjusting parameters of the sequence generation model according to the scaling.
9. A boxing method for boxing reasoning by means of a sequence generation model trained in accordance with the method of any one of claims 4-8, the boxing method comprising: acquiring attribute information of a plurality of objects to be loaded; Each object in the predicted loading sequence is determined by the sequence generating model based on the attribute information of the plurality of objects to be loaded.
10. An electronic device, the electronic device comprising: at least one memory, a computer program stored on the memory; at least one processor; the processor executing the computer program to perform the steps of the method according to any one of claims 1-9.

Description

Boxing method, training method of sequence generation model and electronic equipment Technical Field The disclosure relates to the technical field of boxing, in particular to a boxing method, a training method of a sequence generation model and electronic equipment. Background The object loading sequence generated by the boxing method in the related art has poor quality, and optimal control of freight cost cannot be realized. Disclosure of Invention The disclosure provides a boxing method, a training method of a sequence generation model and electronic equipment. According to one aspect of the disclosure, a boxing method is provided, which comprises the steps of obtaining attribute information of a plurality of objects to be loaded, wherein the attribute information comprises volume information and weight information, gradually determining each object in a predicted loading sequence through a sequence generation model based on the attribute information of the plurality of objects to be loaded, obtaining a current loading sequence after determining a kth object, k being a positive integer smaller than the number of the plurality of objects to be loaded, determining an intermediate evaluation signal based on the current loading sequence, indicating the volume utilization rate of the current loading sequence and determining a volume penalty based on the volume and weight relation of the current loading sequence, adjusting a generation strategy of the sequence generation model based on the intermediate evaluation signal to improve the volume utilization rate and reduce the volume penalty in a subsequent sequence generation process, and continuously determining a (k+1) th object in the predicted loading sequence through the adjusted generation strategy based on the sequence generation model until a complete predicted loading sequence is obtained. According to the embodiment of the disclosure, a generation strategy of a sequence generation model is adjusted based on an intermediate evaluation signal, wherein the generation strategy comprises the steps of reducing sampling consistency in a sequence generation model during subsequent sequence generation when the volume utilization rate is smaller than a first threshold value and the volume penalty is larger than or equal to a second threshold value, and improving sampling consistency in the sequence generation model during subsequent sequence generation when the volume utilization rate is larger than or equal to a third threshold value and the volume penalty is smaller than a fourth threshold value, and the first threshold value is smaller than or equal to the third threshold value and the second threshold value is larger than or equal to the fourth threshold value. According to an embodiment of the present disclosure, determining an intermediate evaluation signal based on a current loading sequence includes performing a bin simulation based on the current loading sequence, determining a number of loading units consumed by the current loading sequence in a simulated bin state, and a loading weight and a loading volume of each loading unit, determining a volume utilization based on the number of loading units consumed and volume information of objects in the current loading sequence, and determining a volume penalty based on the loading weight and the loading volume of each loading unit. According to one aspect of the disclosure, a training method of a sequence generation model is provided, and the training method comprises the steps of obtaining training data, wherein the training data comprises attribute information of a plurality of objects to be loaded and corresponding target loading sequences, gradually determining all the objects in a predicted loading sequence through the sequence generation model based on the attribute information, obtaining a current loading sequence after determining a kth object, k is a positive integer smaller than the number of the plurality of objects to be loaded, determining an intermediate reward signal based on the current loading sequence and the target loading sequence, enabling the intermediate reward signal to represent the consistency of the current loading sequence and the target loading sequence, adjusting parameters of the sequence generation model based on the intermediate reward signal to improve the consistency of the predicted loading sequence and the target loading sequence which are generated later, and enabling the adjusted sequence generation model to be used for continuously determining the kth+1th object in the predicted loading sequence. According to an embodiment of the disclosure, determining an intermediate reward signal based on a current loading sequence and a target loading sequence includes extracting objects included with the current loading sequence from the target loading sequence and maintaining a relative order of the objects in the target loading sequence to obtain a target comparison sequence, and