CN-115280103-B - System and method for determining paths by learning selective optimizations

CN115280103BCN 115280103 BCN115280103 BCN 115280103BCN-115280103-B

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining paths. One example method includes inputting a plurality of path solution candidates to be optimized to a twin neural network, the twin neural network including a plurality of value prediction networks, each value prediction network trained to predict costs associated with the path solution candidates to be optimized, identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on an output of the twin neural network, inputting the identified one or more path solution candidates to be optimized to a path optimizer to obtain one or more optimized path solution candidates, and determining an optimized path solution having a lowest cost from the one or more optimized path solution candidates.

Inventors

ZHANG XINGWEN
YANG SHUANGHONG

Assignees

支付宝实验室(新加坡)有限公司

Dates

Publication Date: 20260508
Application Date: 20210129
Priority Date: 20200323

Claims (20)

1. A computer-implemented method for determining a path, comprising: inputting a plurality of path solution candidates to be optimized to a twin neural network, the twin neural network comprising a plurality of value prediction networks, each value prediction network being trained to predict costs associated with the path solution candidates to be optimized, and to predict a degree to which the path solution candidates to be optimized are optimized and/or a probability that the path solution candidates to be optimized are optimized to an optimal path solution; identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on an output of the twin neural network; inputting the identified one or more path solution candidates to be optimized to a path optimizer to obtain one or more optimized path solution candidates, and From the one or more optimized path solution candidates, an optimized path solution with the lowest cost is determined.
2. The method of claim 1, wherein each of the plurality of path solution candidates to be optimized comprises routing one or more vehicles through one or more paths at a plurality of locations, and the path solution candidate to be optimized is constrained by one or more constraints comprising one or more of: Time constraint; a travel distance constraint; vehicle capacity constraints, and Power cost constraints.
3. The method of claim 2, wherein the path optimizer comprises a set of improvement operators learned based on a reinforcement learning algorithm, and the set of improvement operators performs operations comprising one or more of: Changing the order of at least two of the plurality of positions, the at least two positions being on one of the one or more paths, and A position on one of the one or more paths is moved to another of the one or more paths.
4. The method of claim 1, wherein the plurality of value prediction networks in the twin neural network are identical.
5. The method of claim 1, wherein the twin neural network comprises two value prediction networks, the method further comprising: training the two value prediction networks by performing one or more iterations of an adjustment process, wherein the performing one or more iterations of an adjustment process comprises: acquiring a training set comprising a third path solution candidate to be optimized and a fourth path solution candidate to be optimized; Inputting the training set to the path optimizer to obtain a third score of the third path solution candidate to be optimized and a fourth score of the fourth path solution candidate to be optimized; inputting the training set into the two value prediction networks respectively to obtain a fifth score of the third path solution candidate to be optimized and a sixth score of the fourth path solution candidate to be optimized, and And adjusting the weights of the two value prediction networks at least according to the third score, the fourth score, the fifth score and the sixth score.
6. The method of claim 5, the adjusting process further comprising, prior to inputting the training set into the two value prediction networks: Judging whether the difference value between the third fraction and the fourth fraction is larger than a preset threshold value or not, and And if the difference value is not greater than the preset threshold value, discarding the third path solution candidate to be optimized and the fourth path solution candidate to be optimized.
7. The method of claim 5, wherein the adjusting weights of the two value prediction networks comprises: determining a marker of the training set according to the third score and the fourth score; Converting the fifth score and the sixth score to a fifth logistic regression value and a sixth logistic regression value; determining a cross entropy loss function based on the signature, the fifth logistic regression value, and the sixth logistic regression value, and The weights of the two value prediction networks are adjusted based on the cross entropy loss function.
8. The method of claim 4, wherein each value prediction network of the plurality of value prediction networks comprises: the two-way long-short-term memory LSTM layer comprises a plurality of LSTM units; An attention layer for embedding outputs from the plurality of LSTM cells, and An output layer for generating a score based on the output from the attention layer and a plurality of features associated with the path solution candidate to be optimized.
9. The method of claim 8, wherein the path solution candidate to be optimized comprises a plurality of paths, each path associated with a distance, the plurality of features associated with the path solution candidate to be optimized comprising: the sum of the distances of the multiple paths in the path solution candidate to be optimized, and And standard deviation of the distances of the paths in the path solution candidates to be optimized.
10. The method of claim 1, wherein the twin neural network comprises two value prediction networks, the inputting a plurality of path solution candidates to be optimized to the twin neural network comprising: for each of the plurality of path solution candidates to be optimized: pairing the path solution candidate to be optimized with each other path solution candidate to be optimized among the plurality of path solution candidates to be optimized that is different from the path solution candidate to be optimized, and The path solution candidate to be optimized and each other candidate of the pairing are input into the two value prediction networks to determine a separate score for the path solution candidate to be optimized.
11. The method of claim 10, said inputting the path solution candidate to be optimized and each other path solution candidate to be optimized of the pairing into the two value prediction networks to determine a separate score of the path solution candidate to be optimized, comprising: obtaining a seventh score of the path solution candidate to be optimized and an eighth score of each other path solution candidate to be optimized of the pairing; if the seventh score is greater than the eighth score, determining a positive score for the path solution candidate to be optimized, and If the seventh score is not greater than the eighth score, a non-positive score is determined for the path solution candidate to be optimized.
12. The method of claim 10, wherein the identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on the twin neural network comprises: for each of the plurality of path solution candidates to be optimized: determining a total score of the path solution candidate to be optimized, said total score comprising a sum of said individual scores of the path solution candidate to be optimized, and From the plurality of path solution candidates to be optimized, the path solution candidate to be optimized having the highest total score is identified.
13. A system for determining a path comprising one or more processors and one or more non-transitory computer-readable memories coupled to the one or more processors and configured with instructions executable by the one or more processors to cause the system to perform operations comprising: inputting a plurality of path solution candidates to be optimized to a twin neural network, the twin neural network comprising a plurality of value prediction networks, each value prediction network being trained to predict costs associated with the path solution candidates to be optimized, and to predict a degree to which the path solution candidates to be optimized are optimized and/or a probability that the path solution candidates to be optimized are optimized to an optimal path solution; identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on an output of the twin neural network; inputting the identified one or more path solution candidates to be optimized to a path optimizer to obtain one or more optimized path solution candidates, and From the one or more optimized path solution candidates, an optimized path solution with the lowest cost is determined.
14. The system of claim 13, wherein the twin neural network comprises two value prediction networks, the operations further comprising: training the two value prediction networks by performing one or more iterations of an adjustment process, wherein the performing one or more iterations of an adjustment process comprises: acquiring a training set comprising a third path solution candidate to be optimized and a fourth path solution candidate to be optimized; Inputting the training set to the path optimizer to obtain a third score of the third path solution candidate to be optimized and a fourth score of the fourth path solution candidate to be optimized; inputting the training set into the two value prediction networks respectively to obtain a fifth score of the third path solution candidate to be optimized and a sixth score of the fourth path solution candidate to be optimized, and And adjusting the weights of the two value prediction networks at least according to the third score, the fourth score, the fifth score and the sixth score.
15. The system of claim 14, the adjustment process further comprising, prior to inputting the training set into the two value prediction networks: Judging whether the difference value between the third fraction and the fourth fraction is larger than a preset threshold value or not, and And if the difference value is not greater than the preset threshold value, discarding the third path solution candidate to be optimized and the fourth path solution candidate to be optimized.
16. The system of claim 14, wherein the twin neural network comprises two value prediction networks, the inputting a plurality of path solution candidates to be optimized to the twin neural network comprising: for each of the plurality of path solution candidates to be optimized: pairing the path solution candidate to be optimized with each other path solution candidate to be optimized among the plurality of path solution candidates to be optimized that is different from the path solution candidate to be optimized, and The path solution candidate to be optimized and each other candidate of the pairing are input into the two value prediction networks to determine a separate score for the path solution candidate to be optimized.
17. The system of claim 16, wherein the identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on the twin neural network comprises: for each of the plurality of path solution candidates to be optimized: determining a total score of the path solution candidate to be optimized, said total score comprising a sum of said individual scores of the path solution candidate to be optimized, and From the plurality of path solution candidates to be optimized, the path solution candidate to be optimized having the highest total score is identified.
18. A non-transitory computer-readable storage medium for determining a path configured with instructions executable by one or more processors to cause the one or more processors to perform operations comprising: inputting a plurality of path solution candidates to be optimized to a twin neural network, the twin neural network comprising a plurality of value prediction networks, each value prediction network being trained to predict costs associated with the path solution candidates to be optimized, and to predict a degree to which the path solution candidates to be optimized are optimized and/or a probability that the path solution candidates to be optimized are optimized to an optimal path solution; identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on an output of the twin neural network; inputting the identified one or more path solution candidates to be optimized to a path optimizer to obtain one or more optimized path solution candidates, and From the one or more optimized path solution candidates, an optimized path solution with the lowest cost is determined.
19. The storage medium of claim 18, wherein the twin neural network comprises two value prediction networks, the operations further comprising: training the two value prediction networks by performing one or more iterations of an adjustment process, wherein the performing one or more iterations of an adjustment process comprises: acquiring a training set comprising a third path solution candidate to be optimized and a fourth path solution candidate to be optimized; Inputting the training set to the path optimizer to obtain a third score of the third path solution candidate to be optimized and a fourth score of the fourth path solution candidate to be optimized; inputting the training set into the two value prediction networks respectively to obtain a fifth score of the third path solution candidate to be optimized and a sixth score of the fourth path solution candidate to be optimized, and And adjusting the weights of the two value prediction networks at least according to the third score, the fourth score, the fifth score and the sixth score.
20. The storage medium of claim 19, the adjustment process further comprising, prior to inputting the training set into the two value prediction networks: Judging whether the difference value between the third fraction and the fourth fraction is larger than a preset threshold value or not, and And if the difference value is not greater than the preset threshold value, discarding the third path solution candidate to be optimized and the fourth path solution candidate to be optimized.

Description

System and method for determining paths by learning selective optimizations Technical Field The present application relates generally to systems and methods for determining paths, and in particular, to systems and methods for determining paths by identifying promising path solution candidates and selectively optimizing the identified path solution candidates. Background Path optimization is a process aimed at determining a path solution with optimal cost based on a finite set of path solution candidates. Classical traveller problems (TRAVELING SALESMAN problems, TSP) and vehicle path problems (Vehicle Routing Problem, VRP) are some exemplary variants of path optimization problems. Practical applications of path optimization can be found in fields such as telecommunications network design, task scheduling, transportation system planning, energy, financial and supply chains, etc. The path optimization problem related to finding an effective path for a vehicle is often referred to as VRP. There are several variations of VRPs, including pick-up VRPs (VRPs with pickups AND DELIVERY, VRPPD), last-In-First-Out VRPs (VRP WITH LAST-In-First-Out), VRPs with time Windows (VRP WITH TIME Windows, VRPTW), and VRPs with capacity limits (CAPACITATED VRP, CVRP). In a typical path optimization scenario, an optimal path solution may include multiple paths through N given locations under various constraints. Finding the optimal path solution is challenging because even if the value of N is small, the total number of candidate paths is very large. It has been recognized that determining the optimal solution for a VRP is an NP-hard (non-DETERMINISTIC POLYNOMIAL-TIME HARDNESS, non-deterministic polynomial time complexity class difficulty) problem. In practice, some path solution candidates (e.g., randomly selected paths) may be used as a starting point and optimized by a path optimizer to obtain some optimized path solutions from which the path solution with the lowest cost may be identified. Since each path solution candidate may only direct the optimizer to explore a small portion of the search space (e.g., solution space), it is necessary to apply the optimizer to a large number of path solution candidates to find the optimal path solution. However, application optimizers typically consume a lot of time and computing resources. Accordingly, there is a need to provide a method of optimizing by intelligently identifying promising path solution candidates to determine an optimal path. Disclosure of Invention Various embodiments in this specification may include, but are not limited to, systems, methods, and non-transitory computer-readable media for determining a path. According to some embodiments, a computer-implemented method for determining a path may include inputting a plurality of path solution candidates to be optimized to a twin neural network, the twin neural network including a plurality of value prediction networks, each trained to predict costs associated with the path solution candidates to be optimized, identifying one or more path solution candidates to be optimized from the plurality of path solution candidates to be optimized based on an output of the twin neural network, inputting the identified one or more path solution candidates to be optimized to a path optimizer to obtain one or more optimized path solution candidates, and determining an optimized path solution having a lowest cost from the one or more optimized path solution candidates. In some embodiments, each of the plurality of path solution candidates to be optimized includes routing one or more vehicles through one or more paths at a plurality of locations and the path solution candidate to be optimized is constrained by one or more constraints including one or more of a time constraint, a distance travelled constraint, a vehicle capacity constraint, and a power cost constraint. In some embodiments, the path optimizer includes a set of improvement operators learned based on a reinforcement learning algorithm and the set of improvement operators performs operations including one or more of changing an order of at least two of the plurality of positions, the at least two positions on one of the one or more paths, and moving a position on one of the one or more paths to another of the one or more paths. In some embodiments, the plurality of value prediction networks in the twin neural network are the same. In some embodiments, the twin neural network comprises two value prediction networks, the method further comprising training the two value prediction networks by performing one or more iterations of an adjustment process, wherein the performing one or more iterations of an adjustment process comprises obtaining a training set comprising a third to-be-optimized path solution candidate and a fourth to-be-optimized path solution candidate, inputting the training set to the path optimizer to obtain a third score of the third to-be-optimized path solution