JP-7855192-B2 - Information processing device, crane control system, learning method, and learning program
Inventors
- 川端 馨
- 平林 照司
- 伊瀬 顕史
- 松原 崇充
- 佐々木 光
Assignees
- カナデビア株式会社
- 国立大学法人 奈良先端科学技術大学院大学
Dates
- Publication Date
- 20260508
- Application Date
- 20220228
Claims (9)
- The system includes a learning unit that generates a policy model using observational data observed at a waste treatment facility equipped with a crane for transporting waste, which includes an action policy for determining the action to be performed by the crane and an execution length policy for determining the execution length of the said action . The learning unit generates the action policy and the execution length policy by searching for a policy using an update variable to switch between continuing to execute the action determined by the action policy or updating the action and execution length based on the execution length determined by the execution length policy .
- The information processing apparatus according to claim 1, wherein the learning unit generates the policy model by repeatedly updating the action policy and the execution length policy so as to maximize the sum of the rewards given for each action determined using the action policy and the rewards given for each execution length determined using the execution length policy for each action until a predetermined task is completed.
- The information processing apparatus according to claim 1 or 2, wherein a Gaussian process is applied as the policy model.
- The learning unit generates a policy model that includes the action policy for determining whether the action to be performed by the crane is a grasping operation to grasp the waste with the crane's bucket, or an opening and closing operation to scatter the grasped waste along the bucket's movement path, and the execution length policy for determining the number of times the grasping operation and the opening and closing operation are performed, according to any one of claims 1 to 3.
- The learning unit generates the policy model by repeatedly updating the action policy and the execution length policy so as to maximize the sum of the rewards given for each action determined using the action policy and the rewards given for each execution length determined using the execution length policy for each action, up to the completion of the task of scattering the waste along the movement path of the bucket. The information processing apparatus according to claim 4, wherein the reward given for an action is greater if a larger amount of waste is scattered, and if the waste is scattered more evenly along the movement path, and the reward given for execution length is greater the shorter the time the task is completed.
- An action determination unit that determines the action to be performed by a crane transporting waste, using an action strategy for determining the action to be performed by the crane, in accordance with the observation data observed at the waste treatment facility equipped with the crane, The system comprises an execution length determination unit that determines the execution length according to the observed data using an execution length strategy for determining the execution length of the action determined by the action determination unit , The aforementioned action decision unit and the execution length decision unit repeatedly perform action decisions and execution length decisions during the period until a predetermined task is completed. The action decision unit is an information processing device that, when a decided action has been executed for a predetermined time but the predetermined task has not been completed, determines a new action if the action has been executed for the execution length determined by the execution length decision unit at the time the action was decided, and continues the execution of the action if it has not been executed .
- A crane for transporting waste, The information processing apparatus according to claim 6, A crane control system including a crane control device that operates the crane based on the action and execution length determined by the information processing device.
- A learning method performed by one or more information processing devices, A data acquisition step involves obtaining observational data observed at a waste treatment facility equipped with a crane for transporting waste, and The process includes a learning step of generating a policy model that includes an action policy for determining an action to be performed by the crane using the aforementioned observation data, and an execution length policy for determining the execution length of said action, The learning method involves generating the action policy and the execution length policy by searching for a policy using an update variable to switch between continuing to execute the action determined by the action policy or updating the action and execution length based on the execution length determined by the execution length policy .
- A learning program for causing a computer to function as an information processing device according to claim 1, wherein the learning program causes the computer to function as the learning unit.
Description
Application of Article 30, Paragraph 2 of the Patent Law • Website address https://robomech.org/2021/online https://robomech.org/2021/wp-content/uploads/2021/06/public-use-PG2021-0604.pdf Publication date June 6, 2021 • Research meeting name Robotics and Mechatronics Conference 2021 Venue (website) https://robomech.org/2021/online Date June 7, 2021 • Website address https://doi.org/10.1299/jsmermd. 2021.1P1-I17 Posting date December 25, 2021 This invention relates to an information processing device and the like that can be used for the automatic control of a crane used to transport waste. Generally, waste treatment plants are equipped with storage facilities called pits for storing incoming waste. If the waste stored in the pit is, for example, combustible waste, it is transferred within the pit using a crane with a bucket, agitated, and then fed into a hopper before being sent to an incinerator for incineration. The development of technologies to automate the control of such cranes has been ongoing. For example, Patent Document 1 discloses a technology that automatically controls the crane by calculating an evaluation value for the number of times waste is agitated at various points within the pit, selecting the position of the crane bucket based on that evaluation value, and opening and closing the bucket at that position. Patent No. 5185197 This is a block diagram showing an example of the main components of an information processing device according to one embodiment of the present invention.This figure shows an example configuration of a crane control system including the above-mentioned information processing device.This is a graphical model of GPSTPS.This diagram outlines the task of scattering waste.This flowchart shows an example of the processing performed by the information processing device during the learning of action strategies and execution strategies.This flowchart details the processes performed during a task trial. [System Configuration] The configuration of the crane control system according to this embodiment will be described with reference to Figure 2. Figure 2 is a diagram showing an example of the configuration of the crane control system 7. The crane control system 7 shown in Figure 2 includes an information processing device 1, a crane control device 3, and a crane 5. The crane 5 is a crane that transports waste stored in a pit P, which is a waste storage facility. More specifically, the crane 5 is a bucket crane equipped with a bucket for gripping waste. The waste can be anything that can be transported by the crane 5, such as household waste or industrial waste. The information processing device 1 uses observational data from the waste treatment facility equipped with the crane 5 to determine the action to be performed by the crane 5 and the duration of that action. The crane control device 3 then operates the crane 5 based on the action and duration determined by the information processing device 1. The duration of the action indicates how long the same action is performed. For example, if the action can be counted, the number of times the action is performed can be used as the duration. Alternatively, the duration of the same action can also be used as the duration. Furthermore, the observational data is used to identify the conditions under which the crane 5 performs its actions. The type of observational data used should be determined appropriately depending on the task to be performed by the crane 5 and how that task will be evaluated. For example, if the task is to transport waste using the crane 5, the weight of the waste held in the crane's bucket could be used as observational data. In general, observational data from waste treatment facilities exhibits significant variability due to factors such as inconsistent quality of the waste being processed. Therefore, if a machine learning model for controlling crane 5 is generated from such observational data, there is a possibility that the model will frequently switch the actions executed by crane 5 in response to the variability in the observational data. Thus, generating a highly accurate machine learning model is difficult when observational data exhibits significant variability. Therefore, the information processing device 1 uses the above observation data to generate a policy model that includes an action policy for determining the action to be performed by the crane 5, and an execution length policy for determining the execution length of that action. According to the above policy model, the same action is maintained for the duration of the execution length determined by the execution length policy. Therefore, even when there is significant variability in the observed data, the possibility of generating a machine learning model that frequently switches the actions performed by crane 5 in response to this variability can be reduced. This makes it possible to generate a highly accurate machine learning model, the policy