CN-121997494-A - Machine tool machining parameter online optimization method and equipment based on reinforcement learning
Abstract
The invention belongs to the technical field related to intelligent manufacturing and numerical control machining, and discloses a machine tool machining parameter on-line optimization method and equipment based on reinforcement learning, wherein the method comprises the following steps of S1, calculating stability judging labels and stability margin based on a lobe diagram database corresponding to a machine tool to be optimized, and further generating stability confidence; the method comprises the steps of extracting characteristics capable of representing vibration and load change based on multiple process signals under different processing states to form characteristic vectors, training a lightweight neural network based on the characteristic vectors and original signal window segments to obtain a processing state classifier, S2, performing offline training on a reinforcement learning model based on real processing data and the processing state classifier, stability judging labels and stability confidence, and S3, obtaining spindle rotation speed and feeding quantity per tooth based on real-time processing data, the processing state classifier and the reinforcement learning model. The invention can adjust parameters in real time according to the state.
Inventors
- LIU HONGQI
- SHI HAORAN
- ZHU JUANHUI
- MAO XINYONG
- HE SONGPING
- PENG FANGYU
Assignees
- 华中科技大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260119
Claims (10)
- 1. The machine tool processing parameter online optimization method based on reinforcement learning is characterized by comprising an offline stage and an online stage: Offline stage: The method comprises the steps of S1, calculating stability judging labels and stability margin based on the position of any processing parameter combination in a vane map database corresponding to a machine tool to be optimized relative to a vane map stability boundary, and generating stability confidence based on the obtained stability margin; S2, performing offline training on the reinforcement learning model based on the real processing data, the processing state classifier, the stability judging tag and the stability confidence; on-line stage: S3, inputting real-time processing data into the processing state classifier, inputting the processing state category and the corresponding confidence coefficient output by the processing classifier into the reinforcement learning model, and outputting the spindle rotating speed and the feeding amount of each tooth by the reinforcement learning model.
- 2. The method for online optimization of machine tool machining parameters based on reinforcement learning of claim 1, wherein the reinforcement learning model is updated online in a small step incremental update mode based on real machining data in a control period.
- 3. The method for online optimization of machine tool machining parameters based on reinforcement learning of claim 1, wherein the reward function of the reinforcement learning model is a normalized and smoothed composite reward function constructed by multidimensional indexes.
- 4. The method for online optimization of machine tool processing parameters based on reinforcement learning of claim 3, wherein the expression of the composite reward function is: in the formula, Rewarding for efficiency; Punishment for smoothing; either positive or negative rewards in the form of exponential decay.
- 5. The method for online optimization of machine tool machining parameters based on reinforcement learning of claim 4, wherein negative rewards : Forward excitation in exponentially decaying form : In the middle of Is a weight coefficient; is axial cutting depth; is critical stable axial cutting depth.
- 6. The method for online optimization of machine tool processing parameters based on reinforcement learning according to any one of claims 1 to 5, wherein the cutting force is characterized based on a discrete micro force model for identifying instantaneous cutting thickness, and the numerical expression is as follows: in the formula, 、 、 Respectively the tangential, radial and axial milling forces of the cutting edge element, h is the instantaneous cutting thickness of the cutting edge element in the feeding direction, dz is the axial element length of the cutting edge; 、 、 cutting force coefficients of the milling cutter in tangential, radial and axial directions respectively; 、 、 cutting edge force coefficients of the milling cutter in tangential direction, radial direction and axial direction are respectively.
- 7. The method for online optimization of machine tool machining parameters based on reinforcement learning of claim 6, wherein the cutting force coefficients are calibrated by an average cutting force method, cutting force signals of each group of experiments are periodically averaged to obtain average cutting force in the x direction and the y direction, a linear relation between the average cutting force and the feeding amount of each tooth is established, and a least square method is used for the cutting force calculation based on the obtained linear relation Regression solutions are performed, wherein the slope term of the linear regression is used to determine the cutting term coefficient and the intercept term is used to determine the edge force coefficient.
- 8. The method for online optimization of machine tool machining parameters based on reinforcement learning of any one of claims 1-5, wherein real machining data is input into a machining state classifier, the machining state classifier outputs a machining state class and a corresponding confidence level, and further the reinforcement learning model is trained offline based on the output machining state class and the corresponding confidence level, and a stability discrimination tag and a stability confidence level.
- 9. An online machine tool machining parameter optimizing system based on reinforcement learning is characterized by comprising a memory and a processor, wherein the memory stores a computer program, and the processor executes the online machine tool machining parameter optimizing method based on reinforcement learning according to any one of claims 1-8 when executing the computer program.
- 10. A computer readable storage medium storing machine executable instructions which, when invoked and executed by a processor, cause the processor to implement the reinforcement learning based machine tool process parameter online optimization method of any one of claims 1-8.
Description
Machine tool machining parameter online optimization method and equipment based on reinforcement learning Technical Field The invention belongs to the technical field of intelligent manufacturing and numerical control machining, and particularly relates to a machine tool machining parameter online optimization method and equipment based on reinforcement learning. Background In modern high-end equipment manufacturing, numerical control milling is used as a key processing link, and the efficiency and quality of the numerical control milling directly influence the performance of a product. However, in the actual machining process, due to factors such as cutter wear, uneven materials, environmental disturbance, rigidity, frequency and damping ratio change of parts in the material removal process and the like, the dynamic characteristics of the system continuously change, so that the traditional cutting parameter setting method based on a static flutter stability vane diagram or an empirical formula is difficult to adapt to the dynamic working condition. On one hand, cutting parameters are reduced in an excessively conservative way to avoid chatter, so that the machining efficiency is severely restricted, and on the other hand, the feeding or rotating speed is increased blindly, and severe chatter is easily caused, so that the cutter is damaged, the surface quality is deteriorated, and even the equipment is damaged. The existing self-adaptive processing method is mostly dependent on online chatter detection (such as spectrum analysis and wavelet transformation) combined with a rule base to carry out parameter adjustment, but has the problems of lag response, poor rule generalization capability, incapability of considering multi-objective optimization and the like. In recent years, reinforcement learning has been introduced into the field of process parameter optimization due to its advantages in sequential decision and multi-objective trade-offs. However, the on-line exploratory training performed directly on a real machine tool has the bottlenecks of low sample efficiency, high safety risk, difficult convergence and the like. In addition, there is a significant Domain Gap (Domain Gap) between the simulation environment (Sim) and the Real process (Real), resulting in policy migration failure. Therefore, an online optimization method integrating priori knowledge, signal driving perception and intelligent edge execution is needed, closed loop intelligent regulation and control from offline modeling, online fine tuning and real-time control is achieved, and the difficult problem of processing parameter self-adaption under high-dynamic and strong-interference scenes is overcome. Disclosure of Invention Aiming at the defects or improvement demands of the prior art, the invention provides an online optimization method and equipment for machine tool processing parameters based on reinforcement learning, which aim to solve the problem that the existing parameter optimization method can not adjust parameters in real time according to states. In order to achieve the above object, according to one aspect of the present invention, there is provided an online optimization method for machine tool processing parameters based on reinforcement learning, including an offline stage and an online stage: Offline stage: The method comprises the steps of S1, calculating stability judging labels and stability margin based on the position of any processing parameter combination in a vane map database corresponding to a machine tool to be optimized relative to a vane map stability boundary, and generating stability confidence based on the obtained stability margin; S2, performing offline training on the reinforcement learning model based on the real processing data, the processing state classifier, the stability judging tag and the stability confidence; on-line stage: S3, inputting real-time processing data into the processing state classifier, inputting the processing state category and the corresponding confidence coefficient output by the processing classifier into the reinforcement learning model, and outputting the spindle rotating speed and the feeding amount of each tooth by the reinforcement learning model. Further, in the control period, the reinforcement learning model is updated online in a small-step incremental updating mode based on the real processing data. Further, the reward function of the reinforcement learning model is a normalized and smoothed composite reward function constructed by multidimensional metrics. Further, the expression of the composite bonus function is: in the formula, Rewarding for efficiency; Punishment for smoothing; either positive or negative rewards in the form of exponential decay. Further, negative rewards: Forward excitation in exponentially decaying form: In the middle ofIs a weight coefficient; is axial cutting depth; is critical stable axial cutting depth. Further, the cutting force is ch