CN-116416509-B - Multi-degree-of-freedom vibration screening device control method and system based on reinforcement learning
Abstract
The invention provides a control method and a control system of a multi-degree-of-freedom vibration screening device based on reinforcement learning, wherein a screen surface material distribution monitoring device collects images of screen surface material distribution states of a multi-degree-of-freedom hybrid vibration screen and transmits the images to a controller, the controller monitors uniformity and dispersion degree of the screen surface material according to a screen surface material distribution monitoring model, a grain loss monitoring device monitors grain loss of the multi-degree-of-freedom hybrid vibration screen and transmits the grain loss to the controller, and the controller obtains current screening states of the multi-degree-of-freedom vibration screen from the screen surface material distribution monitoring device, the grain loss monitoring device and an executing mechanism and controls the executing mechanism according to the reinforcement learning model to adjust the screening states to achieve optimal, so that convergence speed and stability of the model are improved, and self-adaptive control of the multi-degree-of-freedom vibration screen is achieved. The control system does not depend on the traditional control rule, but has self-learning capability, and improves the operation applicability and stability of the harvester in different environments.
Inventors
- JIN MINGZHI
- ZHAO ZHAN
- XU YUNFENG
- LIANG ZHENWEI
- WANG ZIQIAN
- ZHANG YANAN
- Mo Wuyi
Assignees
- 江苏大学
Dates
- Publication Date
- 20260512
- Application Date
- 20230322
Claims (16)
- 1. The control method of the multi-degree-of-freedom vibration screening device based on reinforcement learning is characterized by comprising the following steps of: the screen surface material distribution monitoring device collects images of screen surface material distribution states of the multi-degree-of-freedom series-parallel vibrating screen and transmits the images to the controller, and the controller monitors the uniformity K and the dispersion degree P of the screen surface materials according to the screen surface material distribution monitoring model; The kernel loss monitoring device monitors the kernel loss of the multi-degree-of-freedom hybrid vibrating screen and transmits the kernel loss to the controller; the controller obtains the current screening state s, s= [ alpha, beta, f, mu, K, P ] of the freedom degree vibrating screen from the screening surface material distribution monitoring device, the grain loss monitoring device and the actuating mechanism of the multi-freedom degree serial-parallel vibrating screen, wherein the current screening state s, s= [ alpha, beta, f, mu, K, P ] comprises grain loss rate mu, uniformity K, a dispersion coefficient P, a screening surface inclination angle alpha, a screening surface horizontal attitude angle beta and a vibration frequency f, and controls the actuating mechanism to adjust the screening state according to a control model of the multi-freedom degree vibrating screen based on reinforcement learning, so that the loss rate mu, the uniformity absolute value |K| and the dispersion degree P are optimal, and the optimal screening state is achieved; The screen surface material distribution monitoring model is established through the following steps: Collecting an image of the screen surface material distribution state of a multi-degree-of-freedom vibrating screen, carrying out binarization processing on the collected image to extract particle distribution information, carrying out expansion and corrosion processing on the binarized image, establishing a screen surface material distribution model, selecting a rectangular plane C with length and width of h and l from d 1 before the screen and d 2 on the inner side of a screen wall on the image, establishing a plane right angle standard system, sequentially and uniformly sampling on the plane C to obtain a sample, adopting a two-input and one-output neural network to establish a linear two-classification model, wherein the network only comprises an input layer and an output layer, and adopting a gradient ascending method of maximum likelihood estimation to train the model until convergence to obtain an optimal linear equation; The control model of the multi-degree-of-freedom vibrating screen device is established by adopting a deep reinforcement learning method, and comprises an intelligent body, an environment, states of the intelligent body, actions of the intelligent body and a rewarding function R, wherein the environment is defined as the working environment of a control system of the multi-degree-of-freedom vibrating screen device, namely, the inside of a cleaning room, the intelligent body is defined as the whole control system of the multi-degree-of-freedom vibrating screen device, the states of the intelligent body are the working states of the current multi-degree-of-freedom vibrating screen, the actions of the intelligent body are defined as the variable quantity of working parameters of the multi-degree-of-freedom vibrating screen, the control system of the multi-degree-of-freedom vibrating screen device obtains the working states of the current multi-degree-of-freedom vibrating screen through a screen surface material distribution monitoring device, a grain loss monitoring device and an executing mechanism, and feeds back according to the working environment, quantifiable rewarding signals are calculated through the rewarding function R, the variable quantity of the working parameters of the multi-degree-of-freedom vibrating screen is output, the working environment of the multi-degree-of-freedom vibrating screen is influenced by the actions, the screening states are changed and a new rewarding effect is generated, and the optimal screening state is achieved by the cyclic iteration by the screening state and the feedback.
- 2. The reinforcement learning-based multiple degree of freedom vibratory screening device control method of claim 1, wherein the optimal linear equation is: W 1 ·x+W 2 ·y+B=0 Wherein W 1 is the connection weight of the neuron of the optimal input layer, W 2 is the connection weight of the neuron of the optimal output layer, B is the bias term, and x and y are the abscissa and the ordinate of the pixel point in the image coordinate system respectively.
- 3. The method for controlling a multiple degree of freedom vibratory screening device based on reinforcement learning according to claim 2, wherein the method is defined as follows The uniformity of the screen surface is the uniformity of the distribution of the material along the Y-axis direction of the screen surface, Definition of the definition The dispersion coefficient P epsilon (0, h) is the screen surface.
- 4. The method for controlling a multiple degree of freedom vibratory screening device based on reinforcement learning according to claim 3, wherein the optimal screening state is such that a loss rate μ tends to be a minimum value, an absolute value of uniformity |k| tends to be pi/2, and a dispersion degree P tends to be h.
- 5. The method for controlling a multiple degree of freedom vibratory screening device based on reinforcement learning according to claim 1, wherein in the screening state s= [ α, β, f, μ, K, P ], α e (-10 °,10 °), β e (-10 °,10 °), f e (10 hz,15 hz), μ e (0,0.1), P∈(0,h)。
- 6. The method for controlling a multiple freedom vibration screening device based on reinforcement learning according to claim 5, wherein the variation amounts of the inclination angle α of the screen surface, the horizontal attitude angle β of the screen surface and the vibration frequency f of the actuator of the multiple freedom vibration screen are Δα, Δβ and Δf, respectively, defining a multiple freedom vibration screen motion space a= [ Δα, Δβ, Δf ], and discretizing the motion space, wherein: Δα=[0,±0.5°,±1°,±1.5°,±2°], Δβ=[0,±0.5°,±1°,±1.5°,±2°], Δf=[0,±0.5Hz,±1Hz,±1.5Hz,±2Hz]。
- 7. the method according to claim 4, wherein the reward function R is a correlation function of a current grain loss rate μ, uniformity K, and dispersion coefficient P, and screening state change rates Δμ, Δk, and Δp, and includes a state reward and punishment function R s and an action reward and punishment function R a : R=R s +R a ; The state reward and punishment function R s is a value for evaluating the current screening state s, expressed as: R s =ρ 1 ·F 1 (μ)+ρ 2 ·F 2 (|K|)+ρ 3 ·F 3 (P); Where ρ 1 ,ρ 2 ,ρ 3 is a normal number, F 1 (μ) is a decreasing function with respect to the loss rate μ, F 2 (|k|) is an increasing function with respect to the absolute value of uniformity |k|, and F 3 (P) is an increasing function with respect to the dispersion degree P; the action reward and punishment function R a is the value brought by evaluating action a, and is expressed as: R a =σ 1 ·G 1 (Δμ)+σ 2 ·G 2 (Δ|K|)+σ 3 ·G 3 (ΔP); Where σ 1 ,σ 2 ,σ 3 is a normal number, G 1 (Δμ) is an odd function with respect to the loss change rate Δμ, G 1 (Δμ) is continuous and monotonically decreasing, G 2 (Δ| k|) is an odd function with respect to the uniformity absolute value change rate Δ| k|, G 3 (Δp) is an odd function with respect to the dispersion change rate Δp, and G 2 (Δ| k|) is continuous and monotonically increasing with G 3 (Δp).
- 8. The method for controlling a multiple degree of freedom vibratory screening device based on reinforcement learning according to claim 1, wherein the control model of the multiple degree of freedom vibratory screening device is a reinforcement learning model.
- 9. The method of controlling a multiple degree of freedom vibratory screening device based on reinforcement learning of claim 8, wherein the reinforcement learning model is a DQN model and the controller is a DQN controller (28); The DQN model is built by the following steps: The DQN controller (28) reads the current screening state s of the vibrating screen with multiple degrees of freedom from the screening surface material distribution monitoring device and the executing mechanism, predicts the value of each action in the current state through the eval_net network, adopts epsilon-greedy strategy to select action a to be executed next, sends the action a to the vibrating screen executing mechanism for execution, samples the screening state again after the time t to obtain the next screening state s ', calculates rewards according to a rewarding function R, stores the acquired experiences s, a, s', R in an experience library, trains the depth Q network by randomly extracting part of experience from the experience library, trains the eval_net network according to a loss function L (W), makes W '=W for N times per training, makes s=s' for iterative training on a reinforcement learning model until the reinforcement learning model converges, and controls the executing mechanism to adjust the screening state according to the current screening state s after the reinforcement learning model converges, so that absolute dispersion mu K is equal to the optimal degree P of the absolute dispersion mu and the optimal degree P is reached.
- 10. A system for implementing the reinforcement learning-based multiple degree of freedom vibratory screening device control method of any one of claims 1-9, comprising multiple degree of freedom vibratory screen, screen surface material distribution monitoring device, grain loss monitoring device and controller; The multi-degree-of-freedom vibrating screen comprises a vibrating screen (1), a parallel driving mechanism, a serial driving mechanism and a constraint connecting rod (4), and can realize two-to-two translation and two-rotation, wherein the parallel driving mechanism realizes three-degree-of-freedom motion of the rotation of a screen surface around an X axis and a Y axis and the translation of a Z axis, the serial mechanism realizes one-degree-of-freedom reciprocating motion of the screen surface, one end of the constraint connecting rod (4) is connected with a frame, and the other end of the constraint connecting rod is connected with the side surface of the vibrating screen (1); The screen surface material distribution monitoring device collects images of the screen surface material distribution state of the multi-degree-of-freedom vibrating screen and transmits the images to the controller, and the controller monitors the uniformity K and the dispersion degree P of the screen surface material according to the screen surface material distribution monitoring model; The kernel loss monitoring device monitors the kernel loss of the multi-degree-of-freedom vibrating screen and transmits the kernel loss to the controller; the controller obtains the current screening state s, s= [ alpha, beta, f, mu, K, P ] of the free-degree vibrating screen from the screen surface material distribution monitoring device and the executing mechanism of the multi-degree-of-freedom vibrating screen, the current screening state s, s= [ alpha, beta, f, mu, K, P ] comprises grain loss rate mu, uniformity K and dispersion coefficient P monitored by the vibrating screen monitoring system, the screen surface inclination angle alpha, the screen surface horizontal attitude angle beta and the vibration frequency f of the executing mechanism, and the executing mechanism is controlled to adjust the screening state according to the reinforcement learning model, so that the loss rate mu, the uniformity absolute value |K|, and the dispersion degree P are optimal, and the optimal screening state is achieved.
- 11. The system of reinforcement learning based multiple degree of freedom vibratory screening device control method of claim 10, wherein the parallel driving mechanism includes four sets of parallel driving members, a first parallel driving member (2), a second parallel driving member (12), a third parallel driving member (13) and a fourth parallel driving member (18), respectively; Each group of driving components comprises a stepping driving motor (22), a screw rod (23), a sliding block (25), a sliding table base (24) and a laser displacement sensor (26), the sliding table base (24) of each group of driving components is arranged on the frame, the sliding block (25) is connected to one end of a suspension rod (20) through a sixth fisheye bearing (21), the other end of the suspension rod (20) is connected to the vibrating screen (1) through a fifth fisheye bearing (19), the transmitting end of the laser displacement sensor (26) is vertically and downwards arranged on the sliding block (25), and a displacement ranging plate (27) is arranged on the lower end face of the sliding table base (24).
- 12. The system of reinforcement learning based multiple degree of freedom vibratory screening device control method of claim 10, wherein the tandem drive mechanism includes a drive link (7), an eccentric rotating disc (10) and a dc drive motor (9); One end of the driving connecting rod (7) is connected with the vibrating screen (1) through a third fisheye bearing (6), the other end of the driving connecting rod (7) is connected with the eccentric rotating disc (10) through a fourth fisheye bearing (8), the eccentric rotating disc (10) is arranged on an output shaft of the direct current driving motor (9), and the direct current driving motor (9) is arranged on the frame.
- 13. The system of reinforcement learning based multiple degree of freedom vibratory screening device control method of claim 10, wherein the number of restraining links (4) is two; One end of each of the two identical constraint connecting rods (4) is connected to the frame through a second fisheye bearing (5), and the other ends of the two constraint connecting rods (4) are connected to the side face of the vibrating screen (1) through first fisheye bearings (3).
- 14. The system of reinforcement learning based multiple degree of freedom vibratory screening device control method of claim 10, wherein the controller calculates the screen inclination angle α and the screen horizontal attitude angle β according to the following formulas; Wherein, H 1 、H 2 、H 3 and H 4 are the distances from the emitting end to the displacement distance measuring plate (27) detected by the four laser displacement sensors (26), and L X and L Y are the center distances of the parallel driving components along the X axis and the Y axis directions.
- 15. The system of reinforcement learning based multiple degree of freedom vibratory screening device control method of claim 10, wherein the controller calculates the vibration frequency of the vibratory screen (1) according to the formula: omega is the rotational speed of the DC drive motor (9).
- 16. The system of the multi-degree-of-freedom vibration screening device control method based on reinforcement learning according to claim 10, wherein the screening surface material distribution monitoring device comprises a screening camera (15), a plurality of screening surface light sensing sensors (14) and a screening surface RBG light supplementing lamp (16), the screening surface RBG light supplementing lamp (16) is arranged above the screening surface and used for carrying out real-time light supplementing on the screening surface, the screening surface light sensing sensors (14) are used for detecting luminosity of the screening surface and transmitting the luminosity to the controller, the controller adjusts brightness of the RBG light supplementing lamp (16) according to the luminosity, and the camera (15) is used for shooting images of the screening surface material distribution state and transmitting the images to the controller.
Description
Multi-degree-of-freedom vibration screening device control method and system based on reinforcement learning Technical Field The invention belongs to the technical field of screening mechanism control, and particularly relates to a method and a system for controlling a multi-degree-of-freedom vibration screening device based on reinforcement learning. Background The cleaning is an important link of the grain combine harvesting operation, and directly affects the operation performance of the whole machine. The cleaning device of the combine harvester mainly comprises a fan and a vibrating screen. In the combined harvesting operation process, the crops form threshing mixture materials composed of grains and short straws after threshing, the threshing mixture falls onto the screen surface of the vibrating screen under the action of the shaking plate and the return plate, the grains move to the rear of the screen under the combined action of the fan and the vibrating screen, the grains enter the grain collecting device through the screen holes, and the impurities such as the straws are discharged out of the machine from the tail part of the screen. A small amount of the seeds which cannot pass through the sieve can be discharged from the tail part of the vibrating sieve, namely, the cleaning loss is formed. Under the influence of the working principle of the threshing part of the combine harvester, the threshing mixture presents a non-uniform distribution form below the threshing concave, and the non-uniform distribution of the threshing mixture when being input into the vibrating screen surface is directly caused. On the other hand, when the combine harvester is used in hilly and mountain areas and deep mud feet wet rotten field blocks, the horizontal stability of the harvester is poor, the horizontal posture fluctuation is large, and the uneven distribution of threshing mixture input screening surfaces can be aggravated. At present, a single-degree-of-freedom reciprocating vibrating screen is mostly adopted by a combine harvester, the movement track of the single-degree-of-freedom reciprocating vibrating screen is single, and the problem that threshing mixture is unevenly distributed on the screen surface cannot be effectively solved. The multi-degree-of-freedom movement of the screen surface can effectively promote the rapid and uniform dispersion of materials on the screen surface, is an effective way for solving the problem of high-efficiency screening of non-uniform input materials, and is lack of a multi-degree-of-freedom vibration screening control system suitable for grain combine harvesting operation at present. The grain in China is of a plurality of types, is widely distributed in regions and is complex in operation environment, even if the same crop is harvested at different time sections, the screening and cleaning operation performances of the combine harvester are also greatly different, and a multi-degree-of-freedom vibration screening control method and system with strong universality are not established at present so as to improve the operation applicability and stability of the harvester under different environments. Disclosure of Invention In view of the above technical problems, one of the purposes of one embodiment of the present invention is to provide a method and a system for controlling a multi-degree-of-freedom vibrating screen device based on reinforcement learning, which control an actuator to achieve an optimal screening state according to the distribution state of the screen surface materials of the multi-degree-of-freedom vibrating screen, so as to improve the operation applicability and stability of the harvester in different environments. Note that the description of these objects does not prevent the existence of other objects. Not all of the above objects need be achieved in one embodiment of the present invention. Other objects than the above objects can be extracted from the description of the specification, drawings, and claims. The present invention achieves the above technical object by the following means. A control method of a multi-degree-of-freedom vibration screening device based on reinforcement learning comprises the following steps: the screen surface material distribution monitoring device collects images of screen surface material distribution states of the multi-degree-of-freedom series-parallel vibrating screen and transmits the images to the controller, and the controller monitors the uniformity K and the dispersion degree P of the screen surface materials according to the screen surface material distribution monitoring model; The kernel loss monitoring device monitors the kernel loss of the multi-degree-of-freedom hybrid vibrating screen and transmits the kernel loss to the controller; The controller obtains the current screening state s, s= [ alpha, beta, f, mu, K, P ] of the freedom degree vibrating screen from the screening surface material distribution