CN-122008262-A - Industrial robot self-adaptive track planning and control system based on reinforcement learning

CN122008262ACN 122008262 ACN122008262 ACN 122008262ACN-122008262-A

Abstract

The invention relates to the technical field of intelligent control of industrial robots, in particular to an industrial robot self-adaptive track planning and control system based on reinforcement learning, which comprises a data acquisition module, a state rewarding construction module, a decision model training module, an online decision control module and an instruction execution module. The data acquisition module acquires multi-mode sensing data of the industrial robot, the rotary positioner and the processing area sensing network, the state rewarding construction module constructs a model input state and generates rewarding signals according to the composite manufacturing process requirements, and the decision model training module sets continuous actions including joint trace position increment, laser power adjustment quantity and feeding speed and completes model training in a digital twin environment. The online decision control module outputs an action sequence according to the real-time state, and the instruction execution module converts the action sequence into a corresponding driving instruction. The system realizes multi-parameter cooperative self-adaptive control, and improves the synchronism and real-time adaptation capability of track planning and process adjustment.

Inventors

GU TIANYU
LI YANLE
HAN FUZHEN
YANG HAINING

Assignees

山东大学

Dates

Publication Date: 20260512
Application Date: 20260415

Claims (10)

1. Industrial robot self-adaptation orbit planning and control system based on reinforcement study, characterized by comprising: The data acquisition module is used for acquiring multi-mode sensing data streams of the industrial robot, the rotary positioner and the processing area sensing network; The state rewarding construction module is used for processing the multi-mode sensing data stream, constructing the input state of a deep reinforcement learning decision model and constructing rewarding signals according to the quality requirements of the composite manufacturing process; The decision model training module is used for defining an output action of the deep reinforcement learning decision model, wherein the output action is a continuous action comprising a robot joint trace position increment, a laser power instantaneous adjustment quantity and an end effector feeding speed, and the deep reinforcement learning decision model is guided and trained through a reward signal in a digital twin simulation environment so that the deep reinforcement learning decision model learns a mapping relation between the output action and an input state; The on-line decision control module deploys a trained deep reinforcement learning decision model in a physical environment, and the deep reinforcement learning decision model generates a corresponding output action sequence based on an input state acquired in real time; the instruction execution module is used for converting the output action sequence into driving instructions of each control joint of the robot, power control instructions of the laser and speed control instructions of the end effector.
2. The adaptive trajectory planning and control system of reinforcement learning-based industrial robot of claim 1, wherein the specific process of constructing the input state of the deep reinforcement learning decision model comprises: The input state comprises a real-time pose of a robot end effector, robot joint angle feedback, real-time contact force in a processing process, real-time morphology deviation of a processing area and real-time pose of a positioner; Reading real-time coordinates and direction angles of the robot end effector in a basic coordinate system from an industrial robot controller to form a real-time pose of the robot end effector; reading real-time angle measurement values of all joint encoders of the industrial robot to form robot joint angle feedback; Collecting real-time contact force between a tool and a workpiece in the processing process through a force sensor arranged on an end effector of the robot; Shooting a surface image of the current processing area through a processing area vision sensor, registering and comparing the surface image with a reference morphology model, and calculating real-time morphology deviation; reading the real-time angle of the rotating shaft of the rotary positioner controller to form the real-time pose of the positioner; And (3) arranging and combining the real-time pose of the end effector of the robot, the angle feedback of the joints of the robot, the real-time contact force, the real-time shape deviation and the real-time pose of the positioner according to a preset sequence to generate a vector with fixed dimension, wherein the vector is the input state of the deep reinforcement learning decision model.
3. The adaptive trajectory planning and control system of reinforcement learning based industrial robot of claim 2, wherein the specific process of constructing the reward signal comprises: The reward signal is associated with at least profile precision deviation, process constraint violation, and articulation smoothness; defining a calculation method of profile precision deviation, wherein the profile precision deviation is a norm of deviation between real-time morphology deviation of a current processing area and a target reference morphology; Defining a judging method of the process constraint violation condition, and judging that the process constraint violation occurs when the real-time contact force exceeds a preset force threshold or the heat input power of a processing area exceeds a preset power threshold; Defining a quantization method of joint motion smoothness, wherein the joint motion smoothness is measured by calculating a norm of the feedback angular acceleration of the robot joint; Weighting and summing the negative value of the profile precision deviation, the penalty value of the process constraint violation condition and the negative value of the joint motion smoothness, wherein the weight coefficients are profile precision weight, process constraint weight and motion smoothness weight respectively; and taking the weighted sum result as an instant reward signal obtained by the deep reinforcement learning decision model after receiving the input state and executing the corresponding output action in the current decision period.
4. The adaptive trajectory planning and control system of reinforcement learning-based industrial robot of claim 3, wherein in a digital twin simulation environment, training the deep reinforcement learning decision model with the reward signal guidance comprises: Establishing a virtual model of the industrial robot, the workpiece, the rotary positioner and the processing environment in a digital twin simulation environment; In the virtual model, the same dynamics and kinematics parameters as the physical environment are configured for the industrial robot, the workpiece, the rotary positioner and the processing environment; Initializing a deep reinforcement learning decision model in a simulation environment, wherein the deep reinforcement learning decision model comprises an actor network and a criticism network; acquiring a multi-mode sensing data stream in a simulation environment at each simulation step length, processing the multi-mode sensing data stream into an input state, and providing the input state for a deep reinforcement learning decision model; The actor network of the deep reinforcement learning decision model generates output actions based on the input states, and the output actions are executed in a simulation environment; Calculating a reward signal according to the executed simulation environment data, and storing an input state, an output action, the reward signal and a new input state as an experience tuple into a simulation experience playback buffer area; periodically sampling a batch of experience tuples from a simulation experience playback buffer for updating parameters of a actor network and a reviewer network of the deep reinforcement learning decision model; Through iterative training of a large number of simulation step sizes, an actor network of the deep reinforcement learning decision model is learned to generate an output action strategy for maximizing the accumulated reward signal.
5. The adaptive trajectory planning and control system of reinforcement learning-based industrial robot of claim 4, wherein a trained deep reinforcement learning decision model is deployed in a physical environment, the process of generating a corresponding output action sequence based on input states acquired in real time by the deep reinforcement learning decision model comprising: synchronously acquiring the real-time pose of the robot end effector, the robot joint angle feedback, the real-time contact force, the real-time morphology deviation and the real-time pose of the positioner in each control period of the physical processing process; combining the real-time pose of the robot end effector, the robot joint angle feedback, the real-time contact force, the real-time morphology deviation and the real-time pose of the positioner to form an input state of the current control period; Inputting the input state of the current control period into a actor network of the trained deep reinforcement learning decision model; the actor network processes the input state and outputs a multi-dimensional continuous vector, wherein each dimension of the multi-dimensional continuous vector corresponds to the trace position increment of each joint of the robot, the instantaneous adjustment quantity of the laser power and the feeding speed of the end effector; The multidimensional continuous vector is used as an output action generated by the deep reinforcement learning decision model in the current control period, and the process is repeated in the next control period, so that an output action sequence changing with time is generated.
6. The adaptive trajectory planning and control system of reinforcement learning based industrial robot of claim 5, wherein converting the output motion sequence into drive commands for each control joint of the robot, power control commands for the laser, and speed control commands for the end effector comprises: Analyzing an output action corresponding to each control period in an output action sequence, and extracting trace position increment, instantaneous laser power adjustment quantity and end effector feed speed of each joint of the robot contained in the output action; aiming at each joint of the robot, based on angle feedback of the joint in the current control period, adding the trace position increment of the joint appointed in the output action, and calculating to obtain the expected angle of the joint in the next control period; inputting the difference value fed back by the expected angle and the current angle of the joint into a servo controller of each joint of the robot, and converting the difference value into a corresponding driving current or torque instruction; Aiming at the laser, based on the current laser power set value, adding the laser power instantaneous adjustment quantity appointed in the output action as the power set value of the next control period to generate a corresponding power control instruction; For the end effector, the end effector feed speed specified in the output action is directly converted into a speed control command in the tool coordinate system of the robot.
7. The reinforcement learning based adaptive trajectory planning and control system of an industrial robot of claim 6, further comprising: The execution and learning module is used for issuing corresponding driving instructions, power control instructions and speed control instructions to the industrial robot and the laser, driving the robot to execute a processing track and control laser technological parameters, synchronously collecting actual execution result data in the execution process of the industrial robot, associating the actual execution result data with an input state and an output action sequence to form training experience data, transmitting the training experience data back to the deep reinforcement learning decision model, and performing online fine adjustment on the deep reinforcement learning decision model; the specific process of synchronously collecting the actual execution result data in the execution process of the industrial robot comprises the following steps: The actual execution result data comprise the actual pose, the actual contact force, the actual morphology deviation and the actual joint acceleration of the end effector; After each control period is finished, reading actual position feedback provided by the industrial robot controller, and acquiring the actual pose of the end effector; Reading actual readings of a force sensor arranged on the end effector of the robot, and obtaining actual contact force; Acquiring a latest processing surface image through a visual sensor of a processing area, comparing the latest processing surface image with a target model, and calculating to obtain actual morphology deviation; The feedback data of each joint servo driver of the robot are read, the actual angular velocity of the joint is calculated, and the actual acceleration of the joint is estimated through the difference of the angular velocities; And performing time stamp alignment and association binding on the input state and the generated output action corresponding to each control period, wherein the actual pose, the actual contact force, the actual morphology deviation and the joint actual acceleration of the end effector are collected in each control period.
8. The adaptive trajectory planning and control system of reinforcement learning-based industrial robot of claim 7, wherein the training experience data is formed by correlating the actual execution result data with the input state and the output action sequence, and the training experience data is transmitted back to the deep reinforcement learning decision model, and the online fine tuning of the deep reinforcement learning decision model is performed, and the specific process comprises: Establishing a physical experience playback buffer zone for storing training experience data collected from a physical environment, wherein the training experience data is a tuple formed by an input state, an output action, actual execution result data and an input state at the next moment; Setting a trigger threshold, and starting an online fine tuning process when the quantity of training experience data stored in the physical experience playback buffer zone reaches the trigger threshold; in the online fine tuning process, randomly sampling a batch of training experience data from a physical experience playback buffer zone, and calculating a reward signal by using the training experience data; carrying out a round of parameter updating on a commentator network and a actor network in the deep reinforcement learning decision model by using the obtained reward signals and the sampled training experience data; After the parameter updating is completed, the physical experience playback buffer area can be optionally emptied or partially emptied, new training experience data can be continuously collected, and the next triggering on-line fine adjustment is waited.
9. The reinforcement learning based adaptive trajectory planning and control system of an industrial robot of claim 8, wherein the adaptive adjustment of the weighting coefficients involved in the bonus signal construction process comprises: recording actual values of profile accuracy deviation, process constraint violation conditions and joint motion smoothness in each control period when processing tasks are executed in a physical environment; Calculating an average value of profile precision deviations, an average value of frequency of occurrence of process constraint violations and joint motion smoothness within a preset statistical window; dynamically adjusting the profile precision weight according to the deviation degree of the average value of the profile precision deviation relative to a preset target; dynamically adjusting process constraint weights according to the frequency of occurrence of process constraint violation; The motion smoothing weight is dynamically adjusted based on the degree of deviation of the average of the articulation smoothness from a desired level.
10. The adaptive trajectory planning and control system of reinforcement learning based industrial robot of claim 9, wherein the system establishes a model parameter synchronization mechanism between the digital twin simulation environment and the physical environment, the specific working process of the model parameter synchronization mechanism comprising: in a digital twin simulation environment, continuously performing reinforcement learning training based on simulation data, and periodically generating updated parameters of a deep reinforcement learning decision model; in a physical environment, generating updated parameters of a deep reinforcement learning decision model through online fine tuning; Setting a model parameter fusion period, and extracting commentary network parameters and actor network parameters from model parameters obtained by training a simulation environment in each fusion period; extracting commentator network parameters and actor network parameters from model parameters obtained by online fine tuning of a physical environment; weighted average is carried out on the commentator network parameters from the simulation environment and the physical environment, weighted average is carried out on the actor network parameters from the simulation environment and the physical environment, and fused network parameters are obtained; synchronously updating the fused commentator network parameters and the actor network parameters to a deep reinforcement learning decision model deployed in the digital twin simulation environment and the physical environment.

Description

Industrial robot self-adaptive track planning and control system based on reinforcement learning Technical Field The invention relates to the technical field of intelligent control of industrial robots, in particular to an adaptive track planning and control system of an industrial robot based on reinforcement learning. Background When the industrial robot is matched with a rotary positioner to perform compound manufacturing processing, the existing control mode mostly adopts manual teaching and off-line programming to set fixed motion tracks and technological parameters, simple feedback adjustment is realized only through single type of sensing data, discrete action decision or single control quantity output mode is mostly adopted when the deep reinforcement learning is applied to robot control, model training mostly depends on physical prototype trial and error to be performed, the multi-mode sensing network data of a processing area are not integrated, and model iteration is not completed by combining with a digital twin simulation environment. The control mode of fixed track and parameter can't adapt to the real-time working condition change in the course of processing, and robot motion, laser power, feed speed are mostly independent decoupling regulation and control, easily cause machining precision deviation and unstable problem of technology quality, and discrete action or single parameter output are difficult to satisfy the regulation and control demand of serialization fine machining, and physical model machine trial and error training has the problem that with high costs, cycle length, and compound manufacturing technology quality requirement can't be converted into accurate model training reward signal, and multimode sensing data also can't be effectively converted into the effective input state of decision-making model. The integrated continuous action output of the micro-position of the robot joint, the laser power and the feeding speed of the end effector is required to be realized, a reward signal is required to be constructed by combining the quality requirements of a composite manufacturing process to complete the training of a deep reinforcement learning model, so that the requirements of adaptive track planning and control of the industrial robot driven by multi-mode sensing data are met. Disclosure of Invention The invention aims to solve the defects in the prior art, and provides an industrial robot self-adaptive track planning and control system based on reinforcement learning. In order to achieve the purpose, the invention adopts the following technical scheme that the self-adaptive track planning and control system of the industrial robot based on reinforcement learning comprises: The data acquisition module is used for acquiring multi-mode sensing data streams of the industrial robot, the rotary positioner and the processing area sensing network; The state rewarding construction module is used for processing the multi-mode sensing data stream, constructing the input state of a deep reinforcement learning decision model and constructing rewarding signals according to the quality requirements of the composite manufacturing process; The decision model training module is used for defining an output action of the deep reinforcement learning decision model, wherein the output action is a continuous action comprising a robot joint trace position increment, a laser power instantaneous adjustment quantity and an end effector feeding speed, and the deep reinforcement learning decision model is guided and trained through a reward signal in a digital twin simulation environment so that the deep reinforcement learning decision model learns a mapping relation between the output action and an input state; The on-line decision control module deploys a trained deep reinforcement learning decision model in a physical environment, and the deep reinforcement learning decision model generates a corresponding output action sequence based on an input state acquired in real time; the instruction execution module is used for converting the output action sequence into driving instructions of each control joint of the robot, power control instructions of the laser and speed control instructions of the end effector. As a further scheme of the invention, the specific process for constructing the input state of the deep reinforcement learning decision model comprises the following steps: The input state comprises a real-time pose of a robot end effector, robot joint angle feedback, real-time contact force in a processing process, real-time morphology deviation of a processing area and real-time pose of a positioner; Reading real-time coordinates and direction angles of the robot end effector in a basic coordinate system from an industrial robot controller to form a real-time pose of the robot end effector; reading real-time angle measurement values of all joint encoders of the industrial robot to form robot join