Search

KR-102963873-B1 - Apparatus and method for establishing a production schedule for steel plate based on reinforcement learning

KR102963873B1KR 102963873 B1KR102963873 B1KR 102963873B1KR-102963873-B1

Abstract

A reinforcement learning-based steel plate production schedule planning device according to one embodiment includes: a data collection unit that collects past order data including coil characteristic information, delivery date, and work time information required for steel plate work; a database that stores the past order data collected by the data collection unit; a simulation unit that identifies coils requiring work and coil-specific characteristics from the order data stored in the database and generates a coil production scheduling problem; and a reinforcement learning unit that learns model parameters based on knowledge obtained by interacting with the simulation unit.

Inventors

  • 위현곤
  • 박형준

Assignees

  • 재단법인 포항산업과학연구원

Dates

Publication Date
20260511
Application Date
20241217

Claims (17)

  1. A data collection unit that collects past order data including coil characteristic information, delivery date, and work time information required for steel plate work; A database storing past order data collected by the above data collection unit; A simulation unit that identifies coils requiring work and coil-specific characteristics from order data stored in the above database, and generates coil production scheduling problems; A reinforcement learning unit that learns model parameters based on knowledge obtained by interacting with the simulation unit above; and A schedule generation unit that receives information (X1) on coils worked on at a first time point and information (X2) on coils that can be worked on after the first time point as new order data, and outputs an optimal coil production scheduling result by applying the received new order data to the reinforcement learning unit. Includes, The simulation unit also generates virtual data by applying probabilistic changes to key characteristics including one or more of the delivery date, the work duration, and the coil width based on the coil production scheduling problem, and constructs a simulation environment based on the virtual data. The reinforcement learning unit also receives information on coils worked on at a previous time (X1) and information on workable coils (X2) from the simulation unit as state information, performs an action on the simulation environment to select one or more pieces of information among the workable coils and output an expected value, and provides the result to the simulation unit. The simulation unit also provides a reward to the reinforcement learning unit based on the result of an action performed by the reinforcement learning unit, and The above reinforcement learning unit also takes action to maximize the reward provided from the above simulation unit, a reinforcement learning-based steel plate production schedule planning device.
  2. delete
  3. delete
  4. delete
  5. In paragraph 1, A reinforcement learning-based steel plate production schedule planning device, wherein the reinforcement learning model performed by the above reinforcement learning unit is a deep Q-network algorithm.
  6. In paragraph 5, A reinforcement learning-based steel plate production schedule planning device, characterized in that the above-described deep Q-network receives only the current coil work situation (X1) and information about a single coil (X2) as inputs and is modeled to output the expected value when the coil is organized in the current situation.
  7. In paragraph 6, The objective function of the above coil production scheduling includes minimizing delivery delay, minimizing production completion time, and minimizing roll/lot change time, and A reinforcement learning-based steel plate production schedule planning device, wherein the compensation function of the coil production scheduling includes the minimization of delivery delay, the minimization of production completion time, and the minimization of roll/lot change time, excluding the minimization of delivery delay among the objective functions of the coil production scheduling.
  8. In Paragraph 7, A reinforcement learning-based steel plate production schedule planning device in which a heuristic structure is combined with the multi-input stage of the above-mentioned deep Q-network.
  9. In paragraph 8, A reinforcement learning-based steel plate production schedule planning device that classifies tasks into two or more groups based on a surplus delivery time, sets the group that does not exceed the surplus delivery time standard as a priority task group, and preferentially assigns the priority task group to the multi-input terminal.
  10. A simulation unit of a steel plate production schedule planning device identifies coils requiring work and coil-specific characteristics from past order data stored in a database, which includes coil characteristic information, delivery date, and work time information required for steel plate work, and generates a coil production scheduling problem; A step in which the simulation unit generates virtual data by applying a probabilistic change to one or more key characteristics, including the delivery date, the work duration, and the coil width, based on the coil production scheduling problem, and constructs a simulation environment based on the virtual data; The reinforcement learning unit of the above-described steel plate production schedule planning device receives information (X1) of a coil processed at a first time point and information (X2) regarding coils that can be processed after the first time point from the simulation unit, performs an action on the simulation environment to select one or more coils among the coils that can be processed after the first time point and output an expected value regarding the selected coils that can be processed, and provides the result of the performed action to the simulation unit; and A step of receiving information (X1) on coils worked on at the first time point and information (X2) on coils that can be worked on after the first time point as new order data, and outputting an optimal coil production scheduling result by applying the received new order data to the reinforcement learning unit. Includes, A reinforcement learning-based method for establishing a steel plate production schedule, wherein the simulation unit provides a reward to the reinforcement learning unit based on the result of an action performed by the reinforcement learning unit, and the reinforcement learning unit takes the action to maximize the reward to be provided from the simulation unit.
  11. delete
  12. delete
  13. delete
  14. In Paragraph 10, A reinforcement learning-based steel plate production schedule planning method in which the reinforcement learning model performed in the above reinforcement learning unit is a deep Q-network algorithm.
  15. In Paragraph 14, A reinforcement learning-based method for establishing a steel plate production schedule, characterized in that the above-described deep Q-network is a multi-input/single-output structure that receives only the current coil work situation (X1) and information about a single coil (X2) as inputs and is modeled to output the expected value when the coil is organized in the current situation.
  16. In Paragraph 15, A reinforcement learning-based steel plate production schedule planning method in which a heuristic structure is combined with the multi-input stage of the deep Q-network above.
  17. In Paragraph 16, A reinforcement learning-based method for establishing a steel plate production schedule, wherein tasks are classified into two or more groups based on a surplus delivery time, groups that do not exceed the surplus delivery time are set as priority task groups, and priority task groups are preferentially assigned to the multi-input unit.

Description

Apparatus and method for establishing a production schedule for steel plate based on reinforcement learning The present disclosure relates to an apparatus and method for establishing a steel plate production schedule based on reinforcement learning, and in particular, to an apparatus and method for establishing a color steel plate production schedule based on a deep Q-network. At steel plate manufacturers, field experts establish production schedules based on weekly accumulated order data using know-how-based or optimization-based scheduling programs that consider the characteristics of the work and the preparation and processing times of process machinery. All ordered products have fixed estimated work times and delivery dates, and field experts determine the work sequence for a specific period (e.g., one week) to formulate production schedules with the goal of minimizing overall delivery delays and work completion times. Color steel plates are steel plates coated with various colors and patterns, and are mainly used for exterior parts of washing machines, refrigerators, kitchen appliances, etc. Through the process of applying patterns and coatings to the steel plates, they have a cross-section of a product with multiple layers as shown in Fig. 1. The color steel sheet production process involves uncoiling, pre-treatment, painting/coating, pint/lamination, and recoiling/shearing from cold-rolled coils to produce the final product. By organizing coils with homogeneous characteristics into batches for production, the time required to change rolls/lots can be reduced, thereby streamlining the overall operation. However, each color steel sheet manufacturer has different operational and production line characteristics, and individual steel sheet production tasks possess various traits such as sheet width, length, product application, and coating. Furthermore, changes in characteristics between subsequent tasks result in varying processing times, making it difficult to optimize production schedules by taking all these factors into account. Consequently, the current practice regarding the establishment of schedules for existing steel sheet production involves field experts who are well-versed in the characteristics of new coils and production lines formulating plans based on their know-how. To address these issues, reinforcement learning-based scheduling technologies are being developed to derive an optimal work schedule within a finite time and explore scheduling policies and estimation function policies. As an example, Korean Patent Publication No. 10-2022-0121568 describes a scheduling device and method for a production process that learns and explores an estimation function by updating work permutations in the estimation function using a Double-Q learning technique to minimize the final process time for machines with different work times and multiple tasks. However, Korean Public Patent No. 10-2022-0121568 is intended to increase the production efficiency of a machine system by minimizing machine idle time, and thus has limitations in its application to the production scheduling of color steel sheets, which requires minimizing delivery delays, production completion times, and roll/lot change times. Figure 1 is a drawing showing a cross-section of a colored steel plate product. Figure 2 is a block diagram showing the configuration of a typical reinforcement learning device. FIG. 3 is a drawing showing a reinforcement learning-based color steel plate production schedule planning device according to an embodiment of the present invention. Figures 4 and 5 are diagrams showing the structure of a conventional deep Q-network model. FIGS. 6 and FIGS. 7 are diagrams showing a deep Q-network model structure according to an embodiment of the present invention. FIG. 8 is a diagram illustrating a method for establishing a production schedule plan for color steel sheets based on reinforcement learning according to an embodiment of the present invention. FIG. 9 is a Gantt chart showing the performance improvement of a schedule planning reinforcement learning model after learning, according to an embodiment of the present invention. FIG. 10 is a diagram showing a comparison of the delivery delay results of existing business operations and reinforcement learning-based scheduling according to an embodiment of the present invention. Embodiments of the present invention are described below with reference to the attached drawings so that those skilled in the art can easily implement them. However, the present invention may be embodied in various different forms and is not limited to the embodiments described herein. Furthermore, in order to clearly explain the present invention in the drawings, parts unrelated to the explanation have been omitted, and similar parts throughout the specification are denoted by similar reference numerals. Throughout the specification, when a part is described as "including" a certain component, this