CN-122018299-A - Bearing lubrication parameter self-adaptive regulation and control system based on reinforcement learning
Abstract
The invention discloses a self-adaptive regulation and control system for bearing lubrication parameters based on reinforcement learning, which comprises a bearing running state acquisition module, a state vector construction module, a world model learning module, an optimal action module, a regulation and control execution and rewarding evaluation module and a model self-optimization module, wherein the bearing running state acquisition module is used for synchronizing multisource data, the state vector construction module is used for preprocessing and combining the multisource data into a state vector, the world model learning module is used for utilizing an improved stream model to output the state and uncertainty of the next moment through introducing a physical consistency loss item, the optimal action module is used for carrying out multi-step look-ahead planning through a behavior network and outputting optimal regulation and control actions, the regulation and control execution and rewarding evaluation module is used for executing regulation and calculating a real rewarding value based on multi-index feedback, and the model self-optimization module is used for continuously optimizing a model based on experience playback and time sequence difference errors. The method solves the problems of poor dynamic adaptability, unreliable decision and low safety of the traditional method, and realizes high safety and intelligent self-adaptive regulation and control of bearing lubrication.
Inventors
- LIN WEIDI
Assignees
- 上海祎榕实业有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260318
Claims (10)
- 1. The self-adaptive regulation and control system for the lubrication parameters of the bearing based on reinforcement learning is characterized by comprising the following modules: The bearing running state acquisition module is used for synchronously acquiring multi-source data; the state vector construction module is used for preprocessing the multi-source data and combining the multi-source data into a state vector in sequence; the world model learning module is used for inputting the state vector into the encoder network of the improved stream model to obtain a low-dimensional potential state vector, inputting the predicted low-dimensional potential state vector of the world model at the next time of prediction, outputting a prediction uncertainty measure, and comparing the predicted low-dimensional potential state vector with a theoretical physical state vector at the next time after decoding to calculate a physical consistency loss term; The optimal action module is used for carrying out multi-step prospective planning in the world model through the action network, generating a plurality of candidate future action tracks, calculating a final risk penalty item obtained by subtracting the sum of the physical consistency loss item and the risk penalty item from a basic cumulative prize value, and outputting a two-dimensional action vector; The regulation execution and rewarding evaluation module is used for converting the two-dimensional motion vector into a control signal, collecting the instantaneous variation of the temperature of the bearing seat and the main frequency amplitude of the vibration frequency domain characteristic, multiplying the instantaneous variation of the temperature of the bearing seat, the main frequency amplitude of the vibration frequency domain characteristic and the real-time power consumption of the lubrication system by corresponding preset weight coefficients respectively, summing the multiplied values, and taking the negative number of the summed values to obtain a real rewarding value; And the model self-optimization module is used for storing experience tuples consisting of state vectors, two-dimensional action vectors and real rewards into an experience playback pool, and updating the network weight of the improved stream machine model by minimizing time sequence difference errors so as to continuously perform self-optimization of the improved stream machine model.
- 2. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 1, wherein the modules are realized by the following method: s1, synchronously acquiring vibration time domain signals, bearing seat temperature, lubricating oil film pressure values and multisource data of real-time rotating speed of a main shaft in the running process of a bearing through a multisource sensor; S2, performing fast Fourier transform on the vibration time domain signal to extract vibration frequency domain characteristics, performing normalization processing on bearing seat temperature, lubricating oil film pressure value and real-time spindle rotation speed, and sequentially combining the processed vibration frequency domain characteristics, temperature, pressure and rotation speed data sets into a state vector; S3, inputting the state vector into an encoder network of the improved stream model to obtain a low-dimensional potential state vector, inputting a world model to predict a predicted low-dimensional potential state vector at the next moment, outputting a prediction uncertainty measure, and comparing the predicted state with a theoretical physical state vector at the next moment to calculate a physical consistency loss term; s4, performing multi-step prospective planning in a world model through a behavior network, generating a plurality of candidate future action tracks, calculating a final risk penalty item obtained by subtracting the sum of a physical consistency loss item and a risk penalty item from a basic cumulative prize value to obtain a final cumulative prize value, and outputting a two-dimensional action vector which is maximized in the final cumulative prize value and contains a target oil supply pressure value and a target oil supply frequency value; S5, converting the two-dimensional motion vector into a control signal, collecting the instantaneous variation of the temperature of the bearing seat and the main frequency amplitude of the vibration frequency domain characteristic, multiplying the instantaneous variation of the temperature of the bearing seat, the main frequency amplitude of the vibration frequency domain characteristic and the real-time power consumption of the lubrication system by corresponding preset weight coefficients respectively, summing the multiplied values, and taking the sum value as a negative number to obtain a real rewarding value; S6, storing experience tuples formed by the state vector, the two-dimensional motion vector and the real rewards value into an experience playback pool, randomly sampling a batch of experience tuples from the experience playback pool, and updating the network weight of the improved stream machine model by minimizing time sequence difference errors to continuously perform self-optimization of the improved stream machine model.
- 3. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the S1 specifically comprises: s11, physically installing a vibration sensor, a temperature sensor, an oil film pressure sensor and a rotating speed sensor at a preset measuring point of a bearing seat or a main shaft, and simultaneously starting all the sensors to start data acquisition through a synchronous trigger signal; s12, synchronously reading a vibration time domain signal output by the vibration sensor, the temperature of the bearing seat output by the temperature sensor, the lubricating oil film pressure value output by the oil film pressure sensor and the real-time rotating speed of the main shaft output by the rotating speed sensor.
- 4. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the S2 specifically comprises: S21, performing fast Fourier transform on the vibration time domain signal, converting the vibration time domain signal from a time domain to a frequency domain, extracting an amplitude spectrum in the frequency domain as vibration frequency domain characteristics, respectively reading the temperature of a bearing seat, the pressure value of a lubricating oil film and the real-time rotating speed of a main shaft, performing linear normalization processing on each item of data by using a preset maximum value and a preset minimum value, and mapping a numerical range to between 0 and 1; and S22, splicing the normalized bearing seat temperature, the lubricating oil film pressure value and the real-time spindle rotating speed data with the vibration frequency domain features according to a preset sequence to generate a state vector with preset dimensions.
- 5. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the step S3 specifically comprises: S31, inputting a state vector into an encoder network formed by stacking all connection layers with preset layers of an improved stream module, wherein the first layer of all connection layers calculates a state vector and a first bias vector of a preset first weight matrix, and then, nonlinear transformation is carried out through a ReLU activation function to obtain a first layer of coding features, and the second layer of all connection layers receives the first layer of coding features and continues to calculate a preset second weight matrix and a second bias vector; S32, repeatedly stacking low-dimensional potential state vectors with the calculated output dimension of the full connection layer being the preset number of full connection layers, receiving the low-dimensional potential state vectors at the current moment by a strategy network, multiplying the low-dimensional potential state vectors with a preset weight matrix through forward propagation calculation, adding the preset bias vectors, and processing the low-dimensional potential state vectors through a hyperbolic tangent activation function to obtain basic action vectors with two values between minus one and one; s33, reading a preset minimum value and a preset maximum value of the target oil supply pressure and the target oil supply frequency, linearly mapping a first numerical value of a basic motion vector from a negative one to a positive one to a range from the minimum value to the maximum value of the target oil supply pressure and the target oil supply frequency to obtain a basic oil supply pressure value and a basic oil supply frequency value, and repeating all numerical operations on the basic motion vector to generate a two-dimensional motion vector with practical physical significance; S34, inputting the low-dimensional potential state vector into a preset world model of the improved stream module, wherein the world model consists of a state transition network, a reward network and an uncertainty network, the state transition network receives the low-dimensional potential state vector at the current moment and the two-dimensional motion vector executed at the previous moment, calculates the low-dimensional potential state vector through a gating circulation unit network, and outputs the predicted low-dimensional potential state vector at the next moment; S35, the reward network receives the low-dimensional potential state vector at the current moment and the two-dimensional motion vector at the current moment and combines the low-dimensional potential state vector and the two-dimensional motion vector at the current moment into a long vector, calculates and outputs a scalar instant reward value through the preset weight and the bias of a full-connection layer, and the uncertainty network receives the low-dimensional potential state vector at the current moment and the two-dimensional motion vector at the current moment and splices the low-dimensional potential state vector and the two-dimensional motion vector into a long vector, calculates and outputs a prediction uncertainty measure through the full-connection layer with the other preset weight and the bias; S36, introducing a physical consistency loss term in the training process of the world model, wherein the physical consistency loss term is formed by two layers of the low-dimensional potential state vector predicted by the world model and a preset full-connection layer by inputting the low-dimensional potential state vector at the next moment into a decoder network, and restoring the low-dimensional potential state vector into a predicted physical state vector; S37, calculating a theoretical physical state vector at the next moment through a numerical solution method according to an axis trajectory equation and a Reynolds equation of the bearing, and calculating the Euclidean distance square between the predicted physical state vector and the theoretical physical state vector at the next moment to serve as a physical consistency loss term.
- 6. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 5, wherein the step S37 specifically comprises: S371, creating a two-dimensional array for storing the pressure of each point on the oil film grid, initializing the values of all positions to zero, entering a cycle, repeating the cycle until the preset stop cycle times are met, traversing each position of the two-dimensional array in each iteration of the cycle, and reading the pressure values of four adjacent positions above, below, left and right of the grid position which is currently being calculated; S372, acquiring a preset bearing radius clearance value and an offset of the center of the rotating shaft in the vertical direction, recording the corresponding angle of the current grid point on the circumference, and subtracting the product of the offset and the angle cosine value from the radius clearance value to obtain the oil film thickness of the current grid point; S373, reading the rotation speed value of the current main shaft, multiplying the rotation speed value by a constant sixty, converting the unit from each minute to each second, and multiplying the unit by the linear speed of the journal surface obtained by multiplying the circumference ratio by the preset journal radius; S374, subtracting a left side pressure value from a right side pressure value of a current grid point read in the two-dimensional array to obtain a circumferential pressure difference, subtracting a lower pressure value from an upper pressure value to obtain an axial pressure difference, multiplying the circumferential pressure difference by a preset circumferential grid spacing coefficient to obtain a first term, multiplying the axial pressure difference by a preset axial grid spacing coefficient to obtain a second term, multiplying the linear velocity by a preset lubricating oil viscosity, dividing the square of the oil film thickness to obtain a third term, and adding all the first term, the second term and the third term to obtain a sum, namely a pressure update term; S375, multiplying the pressure update item by a preset relaxation coefficient which is larger than 0 and smaller than 1 to obtain an adjustment quantity, adding the adjustment quantity to the original pressure value of the current grid position, and covering the original value with the new value which is the updated pressure; S376, after the pressure value update is carried out on all the positions in the two-dimensional array, completing one global iteration, checking the difference between the new pressure value and the old pressure value of all the positions, if the maximum difference value is smaller than a preset difference threshold value, circulating the cycle, otherwise, repeating the next global iteration, after the cycle is stopped, accumulating the pressure values of all the positions in the two-dimensional array, multiplying the pressure values by the area represented by a single grid, and obtaining the total oil film supporting force; S377, subtracting the known weight of the rotating shaft from the total oil film supporting force to obtain a net force, dividing the net force by a preset bearing stiffness coefficient to obtain a displacement, adding the displacement to an original coordinate of the center of the rotating shaft to obtain a theoretical new axis coordinate at the next moment, and sequentially splicing two-dimensional arrays of the new coordinate and a final pressure value to jointly form a theoretical physical state vector at the next moment.
- 7. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the step S4 specifically comprises: s41, inputting a low-dimensional potential state vector at the current moment into a behavior network of an improved stream model, and performing multi-step prospective planning in a world model by the behavior network to generate candidate future action tracks of which the number of preset planning steps is multiplied by the number of candidate actions of each step; s42, introducing a risk penalty term related to the prediction uncertainty measure in the planning optimization process of the behavioral network, and accumulating the prediction uncertainty measure obtained by the uncertainty network prediction in each step of the preset planning step number for each candidate future action track to obtain the total uncertainty of the current future action track; S43, multiplying the total uncertainty of the current future action track by a preset risk coefficient which is larger than zero to obtain a risk penalty item of the current future action track, and adding a physical consistency loss item of the current candidate future action track and the risk penalty item to obtain a final risk penalty item; S44, multiplying a scalar instant rewarding value obtained by each candidate future action track in each step of a preset planning step number by a preset discount factor which decreases along with the increase of the step number, accumulating to obtain a basic accumulated rewarding value, and subtracting a final comprehensive risk penalty term to obtain a final accumulated rewarding value; And S45, searching and selecting a candidate future action track with the largest final accumulated rewarding value from all candidate future action tracks by the action network, and taking the first candidate action vector as a finally output two-dimensional action vector containing a target oil supply pressure value and a target oil supply frequency value.
- 8. The adaptive bearing lubrication parameter regulation and control system based on reinforcement learning of claim 7, wherein the multi-step look-ahead planning specifically comprises: presetting the number of planning steps and the number of candidate actions of each step, starting from the current moment, calling a random number generator by a behavior network for each step, generating two independent random numbers which accord with standard normal distribution, multiplying the two random numbers by preset exploration intensity coefficients which determine the size of an exploration range, and adding the two scaled random numbers to a basic oil supply pressure value and a basic oil supply frequency value respectively; The added basic oil supply pressure value and basic oil supply frequency value are subjected to range limitation, if the basic oil supply pressure value and the basic oil supply frequency value exceed a preset maximum value, the basic oil supply pressure value and the basic oil supply frequency value are set to the maximum value, if the basic oil supply pressure value and the basic oil supply frequency value are smaller than the minimum value, the basic oil supply pressure value and the basic oil supply frequency value are set to the minimum value, a candidate two-dimensional motion vector is generated, the process is repeated, and different random numbers are used each time to generate a preset number of candidate motion vectors; And inputting each candidate motion vector and the current low-dimensional potential state vector into a world model to obtain a preset number of predicted low-dimensional potential state vectors and scalar instant rewards values, and generating a preset planning step number and a candidate future motion track with the multiplied number of the candidate motion number of each step.
- 9. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the step S5 specifically comprises: S51, multiplying a target oil supply pressure value in the two-dimensional motion vector by a preset pressure conversion coefficient to obtain a first control signal for driving the proportional pressure valve, and multiplying the target oil supply frequency value in the two-dimensional motion vector by a preset frequency conversion coefficient to obtain a second control signal for driving the variable frequency pump; S52, after the first control signal and the second control signal are executed, waiting for a preset time interval, re-acquiring the temperature of the bearing seat, subtracting the temperature of the bearing seat before executing regulation from the temperature of the newly acquired bearing seat to obtain the instantaneous variation of the temperature of the bearing seat, executing fast Fourier transform on the re-acquired vibration time domain signal, finding a frequency point with the maximum amplitude in the obtained vibration frequency domain characteristics, and reading the amplitude to be used as the main frequency amplitude of the vibration frequency domain characteristics; s53, reading real-time working voltage and current of a proportional pressure valve and a variable frequency pump in the lubrication system, and multiplying the real-time working voltage by the real-time working current to obtain real-time power consumption of the lubrication system; S54, multiplying the instantaneous variation of the temperature of the bearing seat by a preset temperature weight coefficient to obtain a first term, multiplying the main frequency amplitude of the vibration frequency domain characteristic by the preset vibration weight coefficient to obtain a second term, multiplying the real-time power consumption of the lubrication system by the preset power consumption weight coefficient to obtain a third term, and adding the sum of the first term, the second term and the third term to obtain a negative number to generate a real rewarding value.
- 10. The self-adaptive bearing lubrication parameter regulation and control system based on reinforcement learning according to claim 2, wherein the step S6 specifically comprises: S61, storing experience tuples consisting of state vectors, two-dimensional motion vectors and real rewards into an experience playback pool, randomly sampling a batch of experience tuples from the experience playback pool, and inputting the state vectors and the two-dimensional motion vectors in each experience tuple into a world model of an improved stream model to obtain a predicted next-moment low-dimensional potential state vector and a scalar instant rewards value; S63, inputting the predicted low-dimensional potential state vector at the next moment into a behavior network of the improved streamer model, calculating a predicted final jackpot value, and adding the predicted scalar instant jackpot value and the predicted final jackpot value to obtain a predicted total return value; S64, inputting a next-moment state vector in each experience tuple into an encoder network to obtain a real next-moment low-dimensional potential state vector, inputting an improved stream model, calculating a real final jackpot value, and adding the real jackpot value and the real final jackpot value to obtain a real total return value; s65, calculating the Euclidean distance square between the predicted total return value and the real total return value to obtain a time sequence difference error, updating the network weight of the improved stream device model according to the time sequence difference error by using a back propagation algorithm and a gradient descent algorithm, and continuously performing self-optimization of the improved stream device model.
Description
Bearing lubrication parameter self-adaptive regulation and control system based on reinforcement learning Technical Field The invention relates to the field of intelligent manufacturing and reinforcement learning, in particular to a self-adaptive regulation and control system for bearing lubrication parameters based on reinforcement learning. Background The reinforcement learning algorithm has the capability of autonomous decision making and optimization through trial and error in a complex environment, and has great potential in the fields of robot control, resource scheduling, industrial process optimization and the like in recent years, and is considered as a key technical path for realizing intelligent operation and maintenance of equipment. However, in the key industrial scene of bearing lubrication, practical application faces many challenges such as large demand of model training data, unexplained physics of decision process, high requirements on safety and stability, and the deployment effect of traditional reinforcement learning is severely restricted. Most of the current bearing lubrication regulation and control methods depend on fixed parameter threshold values or simple PID control logic, are difficult to adapt to dynamic lubrication requirements under variable working conditions and variable loads, lead to frequent occurrence of insufficient lubrication or excessive lubrication, and cannot realize optimal balance of energy efficiency and service life, and partially introduce an intelligent algorithm system, wherein a decision model of the intelligent algorithm system is like a black box, outputs only based on data correlation, ignores bottom physical rules such as axis dynamics, fluid lubrication and the like followed by bearing operation, leads to lack of physical interpretability of decision results, is extremely easy to generate dangerous actions against physical common sense when the working conditions are not seen, and severely restricts the application of the intelligent algorithm system under a high-reliability requirement scene. In addition, most of the existing reinforcement learning-based regulation and control methods adopt a fixed reward function design, and the decision risk of the model in a high-uncertainty state cannot be effectively quantized and avoided, so that the risk strategy of damaging the bearing can be adopted in the exploration process of the system. Meanwhile, the method lacks an effective mechanism for integrating the knowledge of the field into the model training, so that the model learning efficiency is low, massive interactive data are required to converge, the method is difficult to adapt to the actual situation of sparse and expensive data in the real industrial environment, and the practical value and the deployment feasibility of the model in the real production environment are seriously affected. Therefore, how to provide a patient continuous health management system optimized based on reinforcement learning strategies is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide a self-adaptive regulation and control system for bearing lubrication parameters based on reinforcement learning, which fully fuses key steps of bearing running state sensing, state vector construction, improved stream model decision, physical consistency constraint, risk avoidance and the like, and constructs an intelligent lubrication control flow with state vector standardization, world model physical law embedding, prospective planning risk quantification, regulation and control strategy self-optimization. According to the invention, by introducing a physical consistency loss term, priori knowledge such as an axis trajectory equation, a Reynolds equation and the like is fused into the world model training, the problem that the physics of the traditional model decision process cannot be explained and the mechanism is violated is solved, and by introducing a risk penalty term in the behavior network planning, the risk decision in a state with high uncertainty is quantized and avoided, and the safety and the robustness of the system are improved. The method has the advantages of interpretable decision mechanism, high accuracy of regulation and control strategies, high safety self-adaptability, high efficiency of model training and the like, and can remarkably improve the energy efficiency and reliability of bearing lubrication under variable working conditions, thereby effectively solving the problems of poor dynamic adaptability, decision black box, high safety risk and the like in the existing method. According to the embodiment of the invention, the self-adaptive regulation and control system for the bearing lubrication parameters based on reinforcement learning comprises the following modules: The bearing running state acquisition module is used for synchronously acquiring multi-source data; the state vector constructio