Search

US-12617082-B2 - Learning device, learning method, and recording medium

US12617082B2US 12617082 B2US12617082 B2US 12617082B2US-12617082-B2

Abstract

A learning device 1 X mainly includes an optimization problem calculation means 51 X and an executable state set learning means 52 X. The optimization problem calculation means 51 X calculates a function value to be a solution for an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates. The executable state set learning means 52 X learns an executable state set of an action of the robot to be executed by a controller based on a function value.

Inventors

  • Rin TAKANO
  • Hiroyuki Oyama

Assignees

  • NEC CORPORATION

Dates

Publication Date
20260505
Application Date
20210226

Claims (16)

  1. 1 . A learning device comprising: a memory storing instructions; and one or more processors configured to execute the instructions to: set an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot, and calculate a function value of the evaluation function to be a solution of the optimization problem; and learn and control the robot based on a level set function representing an executable state set of the action of the robot to be executed by the controller, based on a plurality of pairs of function values and initial states set for the optimization problem.
  2. 2 . The learning device according to claim 1 , wherein the processor is further configured to calculate a level set approximation function which approximates to the level set function.
  3. 3 . The learning device according to claim 1 , wherein the processor specifies the initial states by sampling based on a Gaussian process regression, and performs learning of the level set function based on the function values to be solutions for the optimization problem according to the specified initial states and the specified initial states.
  4. 4 . The learning device according to claim 1 , wherein the controller includes a low-level controller which generates a control command for the robot and a high-level controller which outputs a control parameter for operating the low-level controller; the processor calculates the control parameter and the function value which are to be each solution for an optimization problem which is set based on the abstract system model, the detailed system model, the low-level controller, and the target parameter; and the processor is further configured to learn the high-level controller based on states included in the executable state set which is learned.
  5. 5 . The learning device according to claim 4 , wherein the processor learns the high-level controller based on each pair of the states included in the executable state set and each control parameter to be the solution of the optimization problem where the states are set to respective ones of the initial states for the optimization problem.
  6. 6 . The learning device according to claim 1 , wherein the evaluation function is a function which evaluates the reachability with respect to a state in an abstract space, and the processor learns the executable state set in the abstract space.
  7. 7 . The learning device according to claim 1 , wherein the processor is further configured to generate a skill tuple with respect to the action of the robot based on the executable state set which is learned.
  8. 8 . A learning method performed by a computer, the learning method comprising: setting an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot; calculating a function value of the evaluation function to be a solution of the optimization problem; and learning and controlling the robot based on a level set function representing an executable state set of the action of the robot to be executed by the controller, based on a plurality of pairs of function values and initial states set for the optimization problem.
  9. 9 . A non-transitory computer-readable recording medium storing a program, the program causing a computer to perform the learning method comprising: setting an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot; calculating a function value of the evaluation function to be a solution of the optimization problem; and learning and controlling the robot based on a level set function representing an executable state set of the action of the robot to be executed by the controller, based on a plurality of pairs of function values and initial states set for the optimization problem.
  10. 10 . The learning device according to claim 4 , wherein the optimization problem is represented by the following equation, where g serves as the evaluation function, and a denotes the control parameter that minimizes the evaluation function: g * = min α g ⁡ ( γ ⁡ ( x ⁡ ( T ) ) ) ( 2 ) s . t . x . = f ⁡ ( x ⁡ ( t ) , π L ( x ⁡ ( t ) , α ) ) , x ⁡ ( 0 ) = x 0 , γ ⁡ ( x 0 ) = x 0 ′ , t ∈ [ 0 , T ] , c ⁡ ( x ⁡ ( t ) , π L ( x ⁡ ( t ) , α ) ) ≤ 0 g denotes the evaluation function determining that a transition from a specified initial state to a target state set is feasible, in a case where “g≤0” is satisfied; γ represents a mapping from a state of the detailed system model to a state of the abstract system model; x(t) denotes a state x at a t time length elapsed from an initial state x0 in an actual system based on a state expression given in the above expression (2); and T denotes a runtime length of skill included in information of the target parameter information.
  11. 11 . The learning device according to claim 1 , wherein the one or more processors are further configured to execute the instructions to control the robot to implement the action as a physical movement of the robot in accordance with a trajectory.
  12. 12 . The learning device according to claim 1 , wherein the one or more processors are further configured to execute the instructions to control the robot to implement the action as a physical servo-control action which moves the robot.
  13. 13 . The learning method according to claim 8 , further comprising: controlling the robot to implement the action as a physical movement of the robot in accordance with a trajectory.
  14. 14 . The learning method according to claim 8 , further comprising: controlling the robot to implement the action as a physical servo-control action which moves the robot.
  15. 15 . The non-transitory computer-readable recording medium according to claim 9 , wherein the program further causes the computer to control the robot to implement the action as a physical movement of the robot in accordance with a trajectory.
  16. 16 . The non-transitory computer-readable recording medium according to claim 9 , wherein the program further causes the computer to control the robot to implement the action as a physical servo-control action which moves the robot.

Description

This application is a National Stage Entry of PCT/JP2021/007341 filed on Feb. 26, 2021, the contents of all of which are incorporated herein by reference, in their entirety. TECHNICAL FIELD The present disclosure relates to a technical field of a learning device, a learning method, and a recording medium for performing learning related to actions of a robot. BACKGROUND ART In a case of performing a control of a robot necessary for executing a task, a system for performing a robot control by providing a skill which modularizes an action of the robot. For example, Patent Document 1 discloses a technique in which, in a system in which an articulated robot performs a given task, a robot skill selectable according to the task is defined as a tuple, and parameters in the tuple are updated through learning. Moreover, Non Patent Document 1 discloses a level set estimation method (LSE: Level Set Estimation) which is an estimation method using a Gaussian process regression based on a Bayesian optimization concept. Furthermore, Non Patent Document 2 discloses a truncated variance reduction (TRUVAR) as another technique for estimating a level set function. PRECEDING TECHNICAL REFERENCES Patent Document Patent Document 1: International Publication Pamphlet No. WO2018/219943 Non Patent Document Non patent Document 1: A. Gotovos, N. Casati, G. Hitz, and A. Krause, “Active learning for level set estimation”, in Int. Joint. Conf. Art. Intel., 2013.Non patent Document 2: Ilija Bogunovic, Jonathan Scarlett, Andreas Krause, and Volkan Cevher, “Truncated variance reduction: A unified approach to Bayesian optimization and level-set estimation”, In Advances in Neural Information Processing Systems (NIPS), pages 1507-1515, 2016. SUMMARY Problem to Be Solved by the Invention In a case where actions robot are modularized as skills and an action plan of a robot using the modularized robot motions is carried out, it is necessary to acquire the skills in advance and retain in a database. In this case, it is necessary to include information concerning that in which state the system is capable of performing each skill. It is one object of the present disclosure to provide a learning device, a learning method, and a recording medium for preferably performing learning regarding each executable state of a robot action. Means for Solving the Problem According to an example aspect of the present disclosure, there is provided a learning device including: an optimization problem calculation means configured to set an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot, and calculate the evaluation function to be a solution of the optimization problem; andan executable state set learning means configured to learn an executable state set of the action of the robot to be executed by the controller, based on the function value. According to another example aspect of the present disclosure, there is provided a learning method performed by a computer, the learning method including: setting an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot;calculating the evaluation function to be a solution of the optimization problem; andlearning an executable state set of the action of the robot to be executed by the controller, based on the function value. According to still another example aspect of the present disclosure, there is provided a learning method performed by a computer, the learning method including: determining, for a system which state is changed by a robot which operates according to the control parameter, the control parameter from a first state to a second state by using a first model representing a relationship between a plurality of states and the control parameter; anddetermining a second model which evaluates an initial state which is reachable to a desired state in the system, based on the first state and the control parameter. According to a further example aspect of the present disclosure, there is provided a recording medium storing a program, the program causing a computer to perform a process including: setting an optimization problem which uses an evaluation function for evaluating reachability to a target state, based on an abstract system model and a detailed system model concerning a system in which a robot operates, a controller related to the robot, and a target parameter concerning an action of the robot;calculating the evaluation function to be a solution of the optimization problem; andlearning an executable state set of the acti