EP-4417941-B1 - ROUTE PLANNING SYSTEM, ROUTE PLANNING METHOD, ROADMAP CONSTRUCTING DEVICE, MODEL GENERATING DEVICE, AND MODEL GENERATING METHOD

EP4417941B1EP 4417941 B1EP4417941 B1EP 4417941B1EP-4417941-B1

Inventors

Okumura, Keisuke
YONETANI, RYO
KUROSE NISHIMURA Mai
MATSUI, Asako

Dates

Publication Date: 20260506
Application Date: 20220916

Claims (14)

A roadmap constructing device (201) comprising: an information acquiring unit (211) configured to obtain target information (221) including a start state and a goal state (31) of each of a plurality of agents in a continuous state space; and a map constructing unit (212) configured to construct a roadmap for each of the agents from the obtained target information (221) using a roadmap constructing model (5) that has completed training, wherein the roadmap constructing model (5) includes: a first processing module (51) configured to generate first characteristics information from target agent information (41) including a goal state (31) of a target agent and a candidate state of a target time step; a second processing module (52) configured to generate second characteristics information from other agent information including goal states (31) of other agents other than the target agent and candidate states of the target time step; and an estimation module (55) configured to estimate one or more candidate states of the target agent in a time step next to the target time step from the first characteristics information and the second characteristics information that are generated, wherein the roadmap constructing model (5) that has completed training is generated using machine learning using learning data acquired from correct answer routes of a plurality of agents for learning, and wherein the constructing of the roadmap for each agent is configured by executing a process of handling one agent among the plurality of agents as the target agent, handling at least some of remaining agents among the plurality of agents as the other agents, estimating one or more candidate states in a next time step using the roadmap constructing model (5) that has completed training by designating the start state of the one agent represented in the obtained target information (221) as a candidate state of the target agent in an initial target time step, and repeating estimating of the candidate state of a next time step using the roadmap constructing model (5) that has completed training by designating each of the one or more estimated candidate states in the next time step as a candidate state of a new target time step until the goal state (31) or a state near the goal state (31) of the one agent is included in estimated one or more candidate states in the next time step by designating each of the plurality of agents as the target agent.
A route planning system (2) comprising: the roadmap constructing device (201) according to claim 1; and a search unit (213) configured to search for a route of each of the agents from the start state to the goal state (31) on the roadmap constructed for each agent.
The route planning system (2) according to claim 2, wherein the roadmap constructing model (5) further includes a third processing module (53) configured to generate third characteristics information from environmental information including information relating to obstacles, wherein the estimation module (55) is configured to estimate one or more candidate states in the next time step from the first characteristics information, the second characteristics information, and the third characteristics information that are generated, wherein the obtained target information (221) is configured to further include the information relating to obstacles present in the continuous state space, and wherein the using of the roadmap constructing model (5) that has completed training includes configuring of the environmental information from the information included in the obtained target information (221) and giving of the configured environmental information to the third processing module (53).
The route planning system (2) according to claim 2 or 3, wherein the target agent information (41) is configured to further include a candidate state of the target agent in a time step before the target time step, and wherein the other agent information is configured to further include candidate states of the other agents in a time step before the target time step.
The route planning system (2) according to any one of claims 2 to 4, wherein the target agent information (41) is configured to further include attributes of the target agent.
The route planning system (2) according to claim 5, wherein the attributes of the target agent include at least one of a size, a shape, a maximum speed, and a weight.
The route planning system (2) according to any one of claims 2 to 6, wherein the other agent information is configured to further include attributes of the other agents.
The route planning system (2) according to claim 7, wherein the attributes of the other agents include at least one of a size, a shape, a maximum speed, and a weight.
The route planning system (2) according to any one of claims 2 to 8, wherein the target agent information (41) is configured to further include a direction flag representing a direction in which the target agent is to transition in the continuous state space.
The route planning system (2) according to any one of claims 2 to 9, wherein each of the plurality of agents is a mobile body configured to autonomously move.
A route planning method causing a computer to execute: a step (S201) of obtaining target information (221) including a start state and a goal state (31) of each of a plurality of agents in a continuous state space; a step (S202) of constructing a roadmap for each of the agents from the obtained target information (221) using a roadmap constructing model (5) that has completed training; and a step (S203) of searching for a route of each of the agents from the start state to the goal state (31) on the roadmap constructed for each agent, wherein the roadmap constructing model (5) includes: a first processing module (51) configured to generate first characteristics information from target agent information (41) including a goal state (31) of a target agent and a candidate state of a target time step; a second processing module (52) configured to generate second characteristics information from other agent information including goal states (31) of other agents other than the target agent and candidate states of the target time step; and an estimation module (55) configured to estimate one or more candidate states of the target agent in a time step next to the target time step from the first characteristics information and the second characteristics information that are generated, wherein the roadmap constructing model (5) that has completed training is generated using machine learning using learning data acquired from correct answer routes of a plurality of agents for learning, and wherein the constructing of the roadmap for each agent is configured by executing a process of handling one agent among the plurality of agents as the target agent, handling at least some of remaining agents among the plurality of agents as the other agents, estimating one or more candidate states in a next time step using the roadmap constructing model (5) that has completed training by designating the start state of the one agent represented in the obtained target information (221) as a candidate state of the target agent in an initial target time step, and repeating estimating of the candidate state of a next time step using the roadmap constructing model (5) that has completed training by designating each of the one or more estimated candidate states in the next time step as a candidate state of a new target time step until the goal state (31) or a state near the goal state (31) of the one agent is included in estimated one or more candidate states in the next time step by designating each of the plurality of agents as the target agent.
A model generating device (1) comprising: a data acquiring unit configured to obtain learning data generated from correct answer routes of a plurality of agents for learning; and a learning processing unit configured to perform machine learning of a roadmap constructing model (5) using the obtained learning data, wherein the roadmap constructing model (5) includes: a first processing module (51) configured to generate first characteristics information from target agent information (41) including a goal state (31) of a target agent and a candidate state of a target time step in a continuous state space; a second processing module (52) configured to generate second characteristics information from other agent information including goal states (31) of other agents other than the target agent and candidate states of the target time step; and an estimation module (55) configured to estimate one or more candidate states of the target agent in a time step next to the target time step from the first characteristics information and the second characteristics information that are generated, wherein the learning data includes a goal state (31) in the correct answer route of each agent for learning and a plurality of data sets, wherein each of the plurality of data sets is configured using a combination of a state of each agent for learning in a first time step and a state of each agent for learning in a second time step, wherein the second time step is a time step next to the first time step, and wherein the machine learning of the roadmap constructing model (5) is configured by: handling one agent for learning among the plurality of agents for learning as the target agent; handling at least some of remaining agents for learning among the plurality of agents for learning as the other agents; and training the roadmap constructing model (5) such that a candidate state of the target agent in a next time step, which is estimated by the estimation module (55), is appropriate for a state of the one agent for learning in the second time step by, for each data set, giving a state of the one agent for learning in the first time step to the first processing module (51) as a candidate state of the target agent in the target time step and giving states of at least some of remaining agents for learning in the first time step to the second processing module (52) as candidate states of the other agents in the target time step.
The model generating device (1) according to claim 12, wherein the target agent information (41) is configured to further include a direction flag representing a direction in which the target agent is to transition in the continuous state space, wherein each data set is configured to further include a training flag representing a direction from the state of the first time step to the state of the second time step in the continuous state space, and wherein the machine learning of the roadmap constructing model (5) includes giving of the training flag of the one agent for learning to the first processing module (51) as a direction flag of the target agent when a candidate state of the target agent in a next time step is estimated for each data set.
A model generating method causing a computer to execute: a step (S101) of obtaining learning data generated from correct answer routes of a plurality of agents for learning; and a step (S102) of performing machine learning of a roadmap constructing model (5) using the obtained learning data, wherein the roadmap constructing model (5) includes: a first processing module (51) configured to generate first characteristics information from target agent information (41) including a goal state (31) of a target agent and a candidate state of a target time step in a continuous state space; a second processing module (52) configured to generate second characteristics information from other agent information including goal states (31) of other agents other than the target agent and candidate states of the target time step; and an estimation module (55) configured to estimate one or more candidate states of the target agent in a time step next to the target time step from the first characteristics information and the second characteristics information that are generated, wherein the learning data includes a goal state (31) in the correct answer route of each agent for learning and a plurality of data sets, wherein each of the plurality of data sets is configured using a combination of a state of each agent for learning in a first time step and a state of each agent for learning in a second time step, wherein the second time step is a time step next to the first time step, and wherein the machine learning of the roadmap constructing model (5) is configured by: handling one agent for learning among the plurality of agents for learning as the target agent; handling at least some of remaining agents for learning among the plurality of agents for learning as the other agents; and training the roadmap constructing model (5) such that a candidate state of the target agent in a next time step, which is estimated by the estimation module (55), is appropriate for a state of the one agent for learning in the second time step by, for each data set, giving a state of the one agent for learning in the first time step to the first processing module (51) as a candidate state of the target agent in the target time step and giving states of at least some of remaining agents for learning in the first time step to the second processing module (52) as candidate states of the other agents in the target time step.

Description

[Technical Field] The present invention relates to a route planning system, a route planning method, a roadmap constructing device, a model generating device, and a model generating method. [Background Art] There are route planning problems of multiple agents in which each of a plurality of agents plans a route for movement to a destination. An agent, for example, is a mobile body that autonomously moves (for example, a conveying robot, a cleaning robot, an automatic driving vehicle, a drone, or the like), a person, a device operated by a person, a manipulator, or the like. Conventionally, the route planning problems of multiple agents are solved on a grid map (a grip space) set in advance. For example, in Non-Patent Document 1, a method of performing a route planning of each agent on a grid map by using a model trained through machine learning is proposed. According to a method using this grid map, positions to which each agent can move are defined in advance, and thus a route of each agent can be found relatively simply. However, movement of each agent is limited in accordance with grids set in advance. For this reason, there is a limit for acquiring a better route (for example, a shortest route in a real space) for each agent. Thus, a method for solving route planning problems of multiple agents not on a grid space set in advance but a continuous space in which each agent can move to a free position has been reviewed. In a case in which route planning problems are to be solved on a continuous space, an approach for constructing a roadmap and searching for a route of each agent on the constructed roadmap is frequently employed (for example, Non-Patent Document 2). A roadmap is composed of nodes and edges and identifies a range in which a route of each agent is to be searched. A node represents a position to which movement can be made, and an edge connects nodes and represents that movement can be made between the connected nodes. When this roadmap is constructed, a node can be disposed at an arbitrary position in accordance with a continuous space that is a target for searching for a route. For this reason, compared to a method using a grid map in which positions to which movement can be made are fixed in advance, according to a method in which a route is searched for on this continuous space, there is a possibility of obtaining a better route for each agent. [Citation List] [Non-Patent Literature] [Non-Patent Literature 1] Mehul Damani, Zhiyao Luo, Emerson Wenzel, Guillaume Sartoretti, "PRIMAL2 Pathfinding via Reinforcement and Imitation Multi-Agent Learning -- Lifelong", IEEE Robotics and Automation Letters, Volume 6, Issue 2, p. 2666-2673, 2021[Non-Patent Literature 2] Felipe Felix Arias, Brian Andrew Ichter, Aleksandra Faust, Nancy M. Amato, "Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments", IEEE International Conference on Robotics and Automation, 2021WO 2020/117958 A1 relates to a motion planner of a computer system of a primary agent, e.g., an autonomous vehicle, uses reconfigurable collision detection architecture hardware to perform a collision assessment on a planning graph for the primary agent prior to execution of a motion plan. US 2020/041274 A1 relates to systems and methods related to roadmaps for robotic devices, the computing device can receive a roadmap representing paths through an environment. [Summary of Invention] [Technical Problem] Inventors and the like of the present disclosure have found that there are the following problems in the above-described conventional method in which route planning problems of multiple agents are solved on a continuous space. In other words, in the conventional method, by disposing nodes in the entire continuous space, a roadmap common to agents is constructed. In one example, nodes are randomly disposed in the entire continuous space. As another example, in a method proposed in Non-Patent Document 2, by using a model trained using machine learning, a node is disposed at a position at which other agents can be avoided near an obstacle. In any method, basically, nodes are disposed in the entire continuous space. At this time, by densely disposing nodes on a continuous space, the number of combinations of routes that can be employed by each agent increases. For this reason, a likelihood of being able to find a more optimal route of each agent using a roadmap thereof is increased. However, when nodes are increasingly densely disposed, the number of nodes increases accordingly, and thus the costs (computational costs and time costs) incurred for searching for a route increase. Thus, when nodes are sparsely disposed, although a cost required for a search can be reduced, the possibility of being able to find an optimal route becomes low. In the worst case, it may become impossible to find a route that can bypass obstacles or other agents. Therefore, conventional methods have a problem that it is difficult to find more o