CN-121994037-A - Top-blown furnace spray gun position control method based on safety reinforcement learning
Abstract
The invention discloses a top-blown furnace spray gun position control method based on safety reinforcement learning, which comprises the following steps of S1, constructing a state space based on multi-source data perception, S2, constructing safety constraint conditions based on parameters in the state space, wherein the safety constraint conditions comprise action output and state layer hard constraint, S3, designing a reward function containing the safety constraint, S4, training constraint reinforcement learning agents based on S1-S3, adjusting instructions, S5, obtaining the adjusting instructions to carry out safety verification, S7, sending the verified safety actions to an executing mechanism, driving a spray gun to complete position adjustment, and storing the interactive data of the time to update the agents. Compared with the prior art, the method has the advantages that the method for controlling the position of the top-blown furnace spray gun based on the safety reinforcement learning is provided based on multi-source data acquisition and state space limitation.
Inventors
- ZHOU XIAOJUN
- ZHONG HUA
- Du Yangyi
- WU YUTONG
- MA CHAOJUN
- HU WENFENG
- LI YUELONG
- Peng jubo
Assignees
- 中南大学
- 云南锡业集团(控股)有限责任公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (7)
- 1. A top-blown furnace spray gun position control method based on safety reinforcement learning is characterized by comprising the following steps: s1, constructing a state space based on multi-source data perception; s2, constructing security constraint conditions based on parameters in a state space, wherein the security constraint conditions comprise action output and state layer hard constraint; s3, designing a reward function containing safety constraint: Giving a positive prize when the process objective is met; Applying a negative penalty when a security constraint is violated or deviates from a secure operating region; s4, training constraint reinforcement learning intelligent agent based on S1-S3; s5, in the operation stage, current working condition data are collected in real time and are input into a state space, and the training-completed intelligent body outputs candidate spray gun adjustment actions to form adjustment instructions; S6, acquiring an adjustment instruction, performing safety verification, and returning to S5 to readjust and output the adjustment instruction if the action exceeds the action amplitude limit range or the speed limit threshold; And S7, sending the checked safety action to an executing mechanism, driving the spray gun to complete position adjustment, and storing the interactive data to update the intelligent agent.
- 2. The method for controlling the position of a top-blown converter lance based on safety reinforcement learning according to claim 1, wherein the action output comprises: The method comprises the steps of performing fusion on an action amplitude limiting range and a speed limiting threshold, wherein parameters comprise current position coordinates of a spray gun in a top-blowing furnace, temperature distribution in the furnace, molten pool liquid level height, furnace body structure parameters and historical safe operation data, and forming geometrical constraints; The state layer hard constraints comprise the minimum safe distance between the spray gun and the furnace wall, the maximum allowable height above the liquid level of the molten pool, the maximum tolerance temperature of a target area and the maximum load of a device driving motor.
- 3. The method for controlling the position of the lance of the top-blown furnace based on safety reinforcement learning of claim 1, wherein the action output condition in the step S2 is a position adjustment amount of the lance, and the position adjustment amount comprises horizontal displacement, vertical displacement and rotation angle.
- 4. The method for controlling the position of the top-blown furnace spray gun based on the safety reinforcement learning of claim 2, wherein the state layer hard constraint acquires constraint parameters through laid sensor data; comprises a ranging unit, a thermal imaging unit, an angle detection unit and a pressure detection unit.
- 5. The method for controlling the position of a top-blown converter lance based on safety reinforcement learning according to claim 1, wherein the safety verification in S6 comprises: checking whether the displacement amplitude and the rotation angle of the candidate action are within a limit range or not, and whether the action rate is lower than a threshold value or not; The state layer checking, namely predicting the position of the spray gun after the action is executed, and judging whether the distance between the spray gun and the furnace wall, the liquid level of a molten pool and the temperature of a target area meet the hard constraint of the state layer; The emergency treatment is that the intelligent agent is guided to adjust and output when the rule is slightly violated, the emergency braking is triggered when the rule is seriously violated, the emergency braking is switched to the preset safe pose, and the tabu state is recorded.
- 6. The method for controlling the position of the top-blown furnace spray gun based on the safety reinforcement learning according to claim 1, wherein the agent update in S7 is based on offline update, the stored interactive data is imported into the simulation environment periodically, And fine-tuning the parameters of the agent based on the new data, and deploying the agent to the site after verifying the safety.
- 7. The method for controlling the position of the top-blown furnace spray gun based on the safety reinforcement learning as set forth in claim 1, wherein the S3 winning function is designed such that a forward prize is positively correlated with the achievement level of the process target, the positioning accuracy of the spray gun is associated, and when the positioning accuracy is higher, the forward prize is given more; The negative penalty is based on a superposition of risk levels, including risk of collision, risk of overburden, risk of overload of the device, and degree of violation of the deviation from the safe operating area, the higher the degree of risk or degree of deviation, the greater the negative penalty imposed.
Description
Top-blown furnace spray gun position control method based on safety reinforcement learning Technical Field The invention relates to the technical field of top-blown furnace spray gun control, in particular to a top-blown furnace spray gun position control method based on safety reinforcement learning. Background Top-blown furnaces are one of the widely used devices in the metallurgical industry, and the position control of the lance is critical to production efficiency and safety. The traditional control method depends on a fixed rule or a simple feedback mechanism, is difficult to cope with complex and changeable actual working conditions, and is easy to cause misoperation or safety accidents. In recent years, reinforcement learning has begun to be applied to automated control systems as an intelligent decision method. However, directly applying reinforcement learning may result in increased risk due to violating security constraints during exploration. Therefore, how to realize efficient automatic control of the position of the spray gun on the premise of ensuring safety is a problem to be solved. Disclosure of Invention The technical problem to be solved by the invention is to overcome the technical defects, and provide a top-blown furnace spray gun position control method based on safety reinforcement learning based on multi-source data acquisition and state space limitation. In order to solve the technical problems, the technical scheme provided by the invention is that the top-blown furnace spray gun position control method based on safety reinforcement learning comprises the following steps: s1, constructing a state space based on multi-source data perception; s2, constructing security constraint conditions based on parameters in a state space, wherein the security constraint conditions comprise action output and state layer hard constraint; s3, designing a reward function containing safety constraint: Giving a positive prize when the process objective is met; Applying a negative penalty when a security constraint is violated or deviates from a secure operating region; s4, training constraint reinforcement learning intelligent agent based on S1-S3; s5, in the operation stage, current working condition data are collected in real time and are input into a state space, and the training-completed intelligent body outputs candidate spray gun adjustment actions to form adjustment instructions; S6, acquiring an adjustment instruction, performing safety verification, and returning to S5 to readjust and output the adjustment instruction if the action exceeds the action amplitude limit range or the speed limit threshold; And S7, sending the checked safety action to an executing mechanism, driving the spray gun to complete position adjustment, and storing the interactive data to update the intelligent agent. Preferably, the action output includes: The method comprises the steps of performing fusion on an action amplitude limiting range and a speed limiting threshold, wherein parameters comprise current position coordinates of a spray gun in a top-blowing furnace, temperature distribution in the furnace, molten pool liquid level height, furnace body structure parameters and historical safe operation data, and forming geometrical constraints; The state layer hard constraints comprise the minimum safe distance between the spray gun and the furnace wall, the maximum allowable height above the liquid level of the molten pool, the maximum tolerance temperature of a target area and the maximum load of a device driving motor. Preferably, the action output condition in S2 is a position adjustment amount of the spray gun, including horizontal displacement, vertical displacement and rotation angle. Preferably, the state layer hard constraint acquires constraint parameters through laid sensor data; comprises a ranging unit, a thermal imaging unit, an angle detection unit and a pressure detection unit. Preferably, the security check in S6 includes: checking whether the displacement amplitude and the rotation angle of the candidate action are within a limit range or not, and whether the action rate is lower than a threshold value or not; The state layer checking, namely predicting the position of the spray gun after the action is executed, and judging whether the distance between the spray gun and the furnace wall, the liquid level of a molten pool and the temperature of a target area meet the hard constraint of the state layer; The emergency treatment is that the intelligent agent is guided to adjust and output when the rule is slightly violated, the emergency braking is triggered when the rule is seriously violated, the emergency braking is switched to the preset safe pose, and the tabu state is recorded. Preferably, the agent update in S7 is based on offline update, periodically importing the stored interaction data into the simulation environment, And fine-tuning the parameters of the agent based on the new data, and dep