CN-122018336-A - Dynamic oxygen distribution control system for converter steelmaking process based on machine learning
Abstract
The invention discloses a dynamic oxygen distribution control system for a converter steelmaking process based on machine learning, which comprises an acquisition module, a preprocessing module, a time sequence modeling module, an evaluation parameter construction module, an implicit state initialization module, a tree search planning module, an instruction generation module, a reward calculation module and a control recording module, wherein the acquisition module is used for acquiring converter operation data, the preprocessing module is used for executing preprocessing, the time sequence modeling module is used for carrying out time sequence modeling, the evaluation parameter construction module is used for constructing a dynamic oxygen distribution reward evaluation parameter set, the implicit state initialization module is used for constructing an improved MuZero model to generate an initial hidden state, the tree search planning module is used for executing multi-step planning, the instruction generation module is used for generating a dynamic oxygen distribution control instruction set, the reward calculation module is used for calculating a reward value, and the control recording module is used for executing parameter update on the improved MuZero model and circularly executing until a converting is finished. The method combines long time sequence modeling with constrained reinforcement learning decision to realize the intelligent control of the dynamic oxygen distribution of the converter, and has the advantages of high control precision, strong operation stability and excellent safety.
Inventors
- LI HONGYONG
- LI MINGTAN
- LIU XIAOYAN
- CHEN PENGFEI
- LI JINHAO
- Zhuo Wenkeng
Assignees
- 中新钢铁集团有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. A converter steelmaking process dynamic oxygen distribution control system based on machine learning is characterized by comprising: the acquisition module is used for acquiring converter operation data in the converter converting process; the preprocessing module is used for preprocessing the converter operation data to generate a converter standardized operation data set, and executing sliding segmentation to generate a converter long time sequence operation fragment set; The time sequence modeling module is used for constructing a converter process state vector set based on the converter long time sequence operation fragment set, inputting Informer networks for time sequence modeling, and generating a converter long time sequence state representation vector; The evaluation parameter construction module is used for constructing a dynamic oxygen distribution rewarding evaluation parameter set; The implicit state initialization module is used for constructing an improved MuZero model, generating a planning root node state vector based on a converter long time sequence state representation vector, and inputting an implicit state transition network to generate an initial implicit state; The tree search planning module is used for constructing a dynamic oxygen distribution action set, taking the initial hidden state as a tree search root node, and executing multi-step planning on the dynamic oxygen distribution action set to generate a node statistic sequence; The instruction generation module is used for determining a target dynamic oxygen distribution action based on the node statistic sequence and generating a dynamic oxygen distribution control instruction set; The rewarding calculation module is used for sending a dynamic oxygen distribution control instruction set, executing adjustment operation, and calculating a corresponding rewarding value for the execution result data according to the dynamic oxygen distribution rewarding evaluation parameter set; and the control record module is used for updating parameters of the improved MuZero model, and circularly executing until the blowing end point to generate a dynamic oxygen distribution control record set.
- 2. The machine learning-based dynamic oxygen distribution control system for the converter steelmaking process according to claim 1, wherein the converter operation data comprises oxygen supply flow, oxygen supply cumulative amount, oxygen lance position height, furnace mouth flue gas carbon monoxide volume fraction, furnace mouth flue gas carbon dioxide volume fraction, molten steel temperature, slag FeO mass fraction, furnace mouth flame image characterization amount and molten steel carbon content.
- 3. The machine learning-based converter steelmaking process dynamic oxygen distribution control system according to claim 1, wherein the preprocessing comprises time synchronization, anomaly rejection, deletion interpolation and normalization.
- 4. The machine learning based converter steelmaking process dynamic oxygen distribution control system of claim 1, wherein said timing modeling module comprises: Based on a converter long time sequence operation segment set, respectively executing time sequence feature construction on oxygen supply flow, oxygen supply cumulative amount, oxygen lance position height, furnace mouth smoke carbon monoxide volume fraction, furnace mouth smoke carbon dioxide volume fraction, molten steel temperature, slag FeO quality fraction, furnace mouth flame image characterization amount and molten steel carbon content in each converter long time sequence operation segment to generate a plurality of corresponding feature vectors; splicing a plurality of characteristic vectors to form converter process state vectors corresponding to the long time sequence operation fragments of the converter, and forming a converter process state vector set by the converter process state vectors; and inputting the converter process state vector set into Informer network for time sequence modeling to generate a converter long time sequence state representation vector representing the long-term evolution trend of the converter condition.
- 5. The machine learning based converter steelmaking process dynamic oxygen distribution control system according to claim 1, wherein said evaluation parameter construction module comprises: Setting a molten steel target temperature value based on the control requirement of a converter steelmaking end point, acquiring the molten steel temperature in a corresponding control period in the blowing process, calculating the difference between the molten steel temperature and the molten steel target temperature value, and performing linear mapping conversion on the difference to generate a molten steel temperature deviation punishment parameter; setting a target carbon content value of molten steel according to a control index of the carbon content of molten steel at a converter end point, acquiring the carbon content of the molten steel in a corresponding control period, calculating the deviation between the carbon content of the molten steel and the target carbon content value of the molten steel, and performing linear mapping conversion on the deviation to generate a deviation punishment parameter of the carbon content of the molten steel; Aiming at a control interval of the volume fraction of the flue gas and the carbon monoxide at the furnace mouth in the converting process of the converter, setting a target range of the volume fraction of the flue gas and the carbon monoxide, acquiring the volume fraction of the flue gas and the carbon monoxide at the furnace mouth in a corresponding time period, calculating the deviation degree between the target range and the flue gas and the carbon monoxide, and linearly mapping the deviation degree to generate a deviation penalty parameter of the volume fraction of the flue gas and the carbon monoxide; In the blowing process, counting the oxygen supply cumulative amount in real time, comparing the current oxygen supply cumulative amount with a preset oxygen supply cumulative amount control threshold value, and performing linear mapping based on the comparison result to generate an oxygen supply cumulative amount punishment parameter; Setting a target interval of the mass fraction of the slag FeO based on the control requirement of the oxidation degree of the slag, collecting the mass fraction of the slag FeO in a corresponding control period, calculating the deviation degree between the mass fraction of the slag FeO and the target interval, and linearly mapping the deviation degree into a slag FeO deviation punishment parameter; calculating a splash risk assessment value based on the characteristic change characteristic of the flame image of the furnace mouth, the characteristic change characteristic of the smoke component of the furnace mouth and the characteristic change of the oxygen supply flow by combining with the splash risk control requirement in the converting process of the converter, and generating a splash risk penalty parameter according to the splash risk assessment value; And combining the molten steel temperature deviation punishment parameter, the molten steel carbon content deviation punishment parameter, the flue gas carbon monoxide volume fraction deviation punishment parameter, the oxygen supply accumulated quantity punishment parameter, the slag FeO deviation punishment parameter and the splash risk punishment parameter to form a dynamic oxygen distribution rewarding evaluation parameter set.
- 6. The machine learning based dynamic oxygen distribution control system for a converter steelmaking process of claim 1, wherein said implicit state initialization module comprises: Constructing an improvement MuZero model, the improvement MuZero model including a representation network, an implicit state transfer network, a value assessment network, a policy output network, and a constrained tree search network; In a representation network, obtaining a converter long time sequence state representation vector, performing linear mapping on the converter long time sequence state representation vector to generate a long-term evolution priori feature vector, and normalizing the long-term evolution priori feature vector to obtain a long-term evolution priori embedded vector; Acquiring a current state feature vector constructed by a standardized operation data set of the converter in a current control period, and performing linear mapping and normalization on the current state feature vector to generate a current state embedded vector; Aligning the long-term evolution prior embedded vector with the current state embedded vector, and executing prior weighted modulation on each dimensional characteristic of the current state embedded vector to generate a modulation state embedded vector constrained by the long-term evolution trend; Performing nonlinear transformation and dimension compression on the modulation state embedded vector to generate a planning root node state vector; And inputting the planning root node state vector into an implicit state transfer network, executing hidden state initialization operation, and generating an initial hidden state which is used as a starting hidden state of a dynamic oxygen distribution control planning process.
- 7. The machine learning based converter steelmaking process dynamic oxygen distribution control system according to claim 1, wherein said tree search planning module comprises: a dynamic oxygen distribution action set is constructed according to the process control requirement of the converter steelmaking process, the dynamic oxygen distribution action set comprises an oxygen supply flow regulation gear set and an oxygen lance position regulation gear set, and oxygen supply strength constraint, oxygen supply accumulation constraint, lance position regulation range constraint and operation safety constraint are associated to form a dynamic oxygen distribution action set with constraint conditions; taking the initial hidden state as a root node hidden state of the constrained tree search network, inputting the initial hidden state into the constrained tree search network, completing the initialization of a tree search structure, and loading a dynamic oxygen distribution action set into a corresponding search action space; In a constrained tree search network, starting from a root node hidden state, selecting a dynamic oxygen distribution action meeting constraint conditions, combining the dynamic oxygen distribution action with a hidden state corresponding to a current search node, inputting an implicit state transition network to execute hidden state recursion operation, and forming a child node hidden state set; Inputting each hidden state of the sub-nodes in the hidden state set of the sub-nodes into a value evaluation network, executing value evaluation operation to generate corresponding node value estimated values, and binding each node value estimated value with corresponding dynamic oxygen distribution actions and hidden states of the sub-nodes one by one to generate a sub-node evaluation result set; Writing the child node evaluation result set into a tree search node structure, and updating node access count and node accumulated value corresponding to the root search node to form updated node statistics; determining a search node of a next search level based on the updated node statistics from the sub-node hidden state set, and taking the determined sub-node hidden state as a current search node hidden state of the next search level; Repeatedly executing the processes of dynamic oxygen distribution action generation, hidden state recursion generation, node value estimation and node statistic updating until the number of preset search levels is reached, and generating a multi-layer dynamic oxygen distribution action search tree structure starting from the hidden state of the root node; summarizing a plurality of search paths formed by the multi-layer dynamic oxygen distribution action search tree structure to obtain node statistics corresponding to search nodes at the tail ends of the search paths; And determining an optimal search path based on node statistics corresponding to search nodes at the tail ends of each search path, extracting a corresponding oxygen supply flow regulation gear sequence and an oxygen lance position regulation gear sequence along the optimal search path, generating a multi-step dynamic oxygen distribution action sequence, and generating a corresponding node statistics sequence.
- 8. The machine learning based dynamic oxygen distribution control system for a converter steelmaking process of claim 1, wherein said instruction generation module comprises: determining a target dynamic oxygen distribution action corresponding to the current control period according to the node statistic sequence; and (3) inputting the target dynamic oxygen distribution action and the initial hidden state into a strategy output network together, and calling the strategy output network to execute action mapping operation to generate a dynamic oxygen distribution control instruction set.
- 9. The machine learning based dynamic oxygen distribution control system for a converter steelmaking process of claim 1, wherein said rewards calculation module comprises: acquiring a dynamic oxygen distribution control instruction set, respectively analyzing to obtain a target oxygen supply flow set value and a target oxygen lance position set value, and generating an oxygen supply instruction parameter value and a lance position instruction parameter value; Writing an oxygen supply command parameter value into an oxygen supply adjusting interface of the converter oxygen supply control system, triggering the oxygen supply control system to generate an oxygen supply executing signal according to the oxygen supply command parameter value, and forming an oxygen supply executing signal value; writing a gun position command parameter value into a gun position control interface of the oxygen gun executing mechanism, triggering the oxygen gun executing mechanism to generate a gun position executing signal according to the gun position command parameter value, and forming a gun position executing signal value; executing oxygen supply flow regulation and oxygen lance position regulation operation under the action of the oxygen supply execution signal value and the lance position execution signal value, and periodically timing the execution process to obtain a control period timing value; After the control period timing value reaches the control period end time, collecting execution result data to form an execution result data set; Generating a dynamic oxygen distribution rewarding evaluation parameter set of the current control period based on the execution result data set; Taking the dynamic oxygen distribution rewarding evaluation parameter set of the current control period as a rewarding calculation parameter combination; A prize value for the current control period is generated based on the combination of the prize-calculating parameters.
- 10. The machine learning based dynamic oxygen distribution control system for a converter steelmaking process of claim 1, wherein the control recording module comprises: Reading a reward value corresponding to a current control period, acquiring state representation information and dynamic oxygen distribution control behavior corresponding to the control period, and taking the state representation information, the dynamic oxygen distribution control behavior and the reward value as training samples to input an improved MuZero model; Based on the training samples, performing parameter updating operations on the representation network, the implicit state transfer network, the value evaluation network and the policy output network; after finishing parameter updating, entering the next control period, and repeatedly executing the converter operation data acquisition, dynamic oxygen distribution control instruction generation, rewarding value calculation and model parameter updating processes until the converter converting end point; In the cyclic execution process, the dynamic oxygen distribution control instruction set, the rewarding value and the execution result data corresponding to each control period are recorded according to the control period sequence to form a dynamic oxygen distribution control record set.
Description
Dynamic oxygen distribution control system for converter steelmaking process based on machine learning Technical Field The invention relates to the field of metallurgical automation control, in particular to a dynamic oxygen distribution control system for a converter steelmaking process based on machine learning. Background The oxygen supply control in the converter steelmaking process is a key link affecting the temperature, carbon content, slag composition and production safety of molten steel, the existing converter dynamic oxygen distribution technology mainly depends on an empirical model, a static control rule or a feedback regulation method based on a small amount of process variables, the end point control is realized through the correction of a preset oxygen supply curve or a staged parameter, partial improvement schemes introduce a data driving model or a machine learning method to predict the running state of the converter and assist in oxygen supply decision, but most of the improvement schemes still are based on short-time observation data, the long-term evolution characteristics of the converter blowing process are difficult to fully describe, and the adaptability to the complex nonlinear reaction process is limited. In addition, the existing dynamic oxygen distribution method based on the intelligent algorithm mostly adopts a simple rewarding function or a post constraint screening strategy, takes safety constraint and process constraint as result correction conditions, and fails to directly incorporate physical constraints such as oxygen supply intensity, accumulation amount, gun position change, splashing risk and the like into a decision space in a decision planning stage, so that unstable control strategy or potential safety hazard are easily caused. Meanwhile, the existing method lacks effective balance between multi-step decision evaluation and real-time execution, is difficult to fully utilize long time sequence information to perform prospective optimization while ensuring industrial real-time performance, and restricts further improvement of dynamic oxygen distribution control precision and operation safety in the converter steelmaking process. Disclosure of Invention The invention aims to provide a dynamic oxygen distribution control system for a converter steelmaking process based on machine learning, which combines long time sequence modeling with constrained reinforcement learning decision to realize intelligent control of the dynamic oxygen distribution of the converter and has the advantages of high control precision, strong operation stability and excellent safety. According to the embodiment of the invention, a dynamic oxygen distribution control system for a converter steelmaking process based on machine learning comprises the following components: the acquisition module is used for acquiring converter operation data in the converter converting process; the preprocessing module is used for preprocessing the converter operation data to generate a converter standardized operation data set, and executing sliding segmentation to generate a converter long time sequence operation fragment set; The time sequence modeling module is used for constructing a converter process state vector set based on the converter long time sequence operation fragment set, inputting Informer networks for time sequence modeling, and generating a converter long time sequence state representation vector; The evaluation parameter construction module is used for constructing a dynamic oxygen distribution rewarding evaluation parameter set; The implicit state initialization module is used for constructing an improved MuZero model, generating a planning root node state vector based on a converter long time sequence state representation vector, and inputting an implicit state transition network to generate an initial implicit state; The tree search planning module is used for constructing a dynamic oxygen distribution action set, taking the initial hidden state as a tree search root node, and executing multi-step planning on the dynamic oxygen distribution action set to generate a node statistic sequence; The instruction generation module is used for determining a target dynamic oxygen distribution action based on the node statistic sequence and generating a dynamic oxygen distribution control instruction set; The rewarding calculation module is used for sending a dynamic oxygen distribution control instruction set, executing adjustment operation, and calculating a corresponding rewarding value for the execution result data according to the dynamic oxygen distribution rewarding evaluation parameter set; and the control record module is used for updating parameters of the improved MuZero model, and circularly executing until the blowing end point to generate a dynamic oxygen distribution control record set. Optionally, the converter operation data comprises oxygen supply flow, oxygen supply cumulative amount,