CN-121973191-A - Self-adaptive control optimization method for manipulator

CN121973191ACN 121973191 ACN121973191 ACN 121973191ACN-121973191-A

Abstract

The invention discloses a self-adaptive control optimization method of a manipulator, which relates to the technical field of intelligent control, and comprises the steps of acquiring a real-time sequence signal flow of driving pressure and current by using a manipulator body sensing network, converting the real-time sequence signal flow into a continuous time sequence feature matrix, constructing an equivalent rigid body dynamics model by using physical structure data, defining a state variable and an error variable, inputting the continuous time sequence feature matrix into an echo state network, mapping and outputting a state vector of the storage pool by using the structure of the storage pool, defining the state vector of the storage pool as a feature basis function vector, calculating a Belman error based on the error variable, constructing a reinforcement learning evaluation network by using the feature basis function vector, calculating a weight estimation value and a disturbance estimation parameter to obtain a lumped uncertainty function, calculating an output control moment based on the lumped uncertainty function, calculating an actual output moment of an actuator by the output control moment, and completing the control of the manipulator.

Inventors

XIAO QIAN
Huai chuangfeng
YANG WENBIN

Assignees

华东交通大学

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. The self-adaptive control optimization method of the manipulator is characterized by comprising the following steps of: step S1, acquiring a real-time sequence signal flow of driving pressure and current by using a manipulator body sensing network, converting the real-time sequence signal flow into a continuous time sequence feature matrix, constructing an equivalent rigid body dynamics model by physical structure data, and defining a state variable and an error variable; s2, inputting the continuous time sequence feature matrix into an echo state network, mapping and outputting a reserve pool state vector through a reserve pool structure, and defining the reserve pool state vector as a feature basis function vector; step S3, calculating Belman errors based on the error variables, constructing a reinforcement learning evaluation network by utilizing the characteristic basis function vector, calculating weight estimation values and disturbance estimation parameters of the reinforcement learning evaluation network, and obtaining a lumped uncertainty function based on the weight estimation values and the disturbance estimation parameters; and S4, calculating an output control moment based on the lumped uncertainty function and the sliding mode function value, calculating the actual output moment of the actuator through the output control moment, and controlling the manipulator through the actual output moment of the actuator.
2. The method for optimizing adaptive control of a manipulator according to claim 1, wherein the step S1 comprises the sub-steps of: step S101, arranging a sensor group in a parallel integrated over-constraint structure of each joint of the manipulator, constructing a body sensing network by utilizing a physical coupling relation between software drivers, and acquiring multidimensional physical quantities generated by operation and interaction of the manipulator and the outside through the body sensing network to generate a real-time sequence signal stream, wherein the multidimensional physical quantities comprise a liquid pressure signal inside the software drivers and a driving current signal of a driving motor.
3. The method for optimizing adaptive control of a manipulator according to claim 2, wherein the step S1 further comprises the sub-steps of: step S102, presetting the length as Sliding along a time axis on the real-time sequence signal flow with a preset sampling step length, intercepting multi-dimensional signal data in the range of the current sliding time window at each sampling moment, wherein the multi-dimensional signal data comprises multi-channel hydraulic pressure waveform signals and a driving current sequence, extracting time domain waveform feature vectors of the multi-dimensional signal data by utilizing a wavelet scattering network, and fusing all channel features through matrix dimension transformation to construct a continuous time sequence feature matrix.
4. A method of optimizing adaptive control of a robot as claimed in claim 3, wherein said step S1 further comprises the sub-steps of: step S103, acquiring physical structure data of the manipulator, wherein the physical structure data comprise joint geometric dimensions, component mass, rotational inertia, mass center position, joint velocity vector and joint position vector; Calculating to obtain an inertia matrix based on the mass, the moment of inertia and the mass center position of the component; calculating a gravity vector based on the component mass, the joint position vector and the gravity acceleration constant; Based on the partial derivative of the inertia matrix to the joint position vector, calculating to obtain a Coriolis force matrix by combining the joint velocity vector; based on physical structure data, an equivalent rigid body dynamics model of the manipulator is constructed by adopting a rigid body equivalent hypothesis, and the mathematical expression of the equivalent rigid body dynamics model is as follows: ; Wherein, the As a vector of the position of the joint, As a vector of the velocity of the joint, As the joint acceleration vector, the motion vector, Is a matrix of inertia which is a matrix of inertia, In the form of a coriolis force matrix, The force vector of the gravity is used to determine, In order to integrate the disturbance moment(s), The torque is actually output for the actuator; the mathematical expression of the actual output torque of the actuator is: ; Wherein, the In order to control the moment of force ideally, For an unknown time-varying fault matrix, Biasing a fault vector for the actuator; defining state variables and error variables, specifically including: the state variables include a first state variable and a second state variable, and the error variables include a first error variable and a second error variable; defining a joint position vector as a first state variable and a joint velocity vector as a second state variable; the method comprises the steps of obtaining an expected track vector and a first derivative of the expected track vector set by a manipulator operation task, subtracting the expected track vector from a first state variable to obtain a first error variable, and subtracting the first derivative from a second state variable to obtain a second error variable.
5. The method for optimizing adaptive control of a manipulator according to claim 4, wherein the step S2 comprises the sub-steps of: Step S201, inputting the continuous time sequence feature matrix into an echo state network, and performing high-dimensional mapping on the input continuous time sequence feature matrix by utilizing a reserve pool structure of the echo state network to output a reserve pool state vector; Step S202, defining the reserve pool state vector as a radial basis function vector, wherein the characteristic basis function vector comprises an Actor basis function vector and a Critic basis function vector.
6. The method for optimizing adaptive control of a manipulator according to claim 5, wherein the step S3 comprises the sub-steps of: step S301, calculating a bellman error based on the first error variable, the second error variable and the ideal control moment, where a mathematical expression of the bellman error is: ; Wherein, the In the event of a bellman error, Is a transpose of the first error variable, As a first variable of the error, Is the transposition of the ideal control moment, For the first positive constant matrix, A second positive constant matrix; Constructing a reinforcement learning evaluation network based on the characteristic basis function vector, and calculating a weight estimation value of the reinforcement learning evaluation network; establishing a reinforcement learning evaluation network based on the Critic basis function vector, and calculating a weight estimation value of the reinforcement learning evaluation network; Constructing a gradient term based on the Critic basis function vector and a second error variable; Iterating the reinforcement learning evaluation network based on the Critic basis function vector, the Belman error and the weight estimation value to obtain a weight change rate of the reinforcement learning evaluation network, wherein the mathematical expression of the weight change rate is as follows: ; Wherein, the In order to be a rate of change of the weight, As a gradient term, As the weight estimate value, there is provided, In the event of a bellman error, For the preset learning rate, the learning rate is set, Is a transposed version of the gradient term.
7. The method for optimizing adaptive control of a robot according to claim 6, wherein the step S3 further comprises the sub-steps of: step S302, constructing a sliding mode function based on the first error variable and the second error variable, where a mathematical expression of the sliding mode function is: ; Wherein, the For the value of the sliding mode function, Is a parameter of a preset convergence index, For the preset gain adjustment parameter(s), As a first variable of the error, Is the second error variable.
8. The method for optimizing adaptive control of a manipulator according to claim 7, wherein the step S3 further comprises the sub-steps of: Step S303, aiming at the dynamic disturbance caused by the abrupt fault of the actuator, designing an online update law of disturbance estimation parameters through a nonlinear damping term generated by the state vector of the reserve tank, and calculating the change rate of the disturbance estimation parameters, wherein the mathematical expression of the change rate of the disturbance estimation parameters is as follows: ; Wherein, the The rate of change of the parameter is estimated for the disturbance, Is a first positive constant which is a first positive constant, Is a second positive constant which is a function of the first constant, For the purpose of estimating the parameters for the disturbance, As a second variable of the error, For a point-to-point multiplication, As a function of the switch, Is the value of the sliding mode function.
9. The method for optimizing adaptive control of a manipulator according to claim 8, wherein the step S3 further comprises the sub-steps of: Step S304, mapping the weight estimation value to an execution network to obtain an execution network weight, and carrying out vectorization recombination based on the execution network weight and disturbance estimation parameters to construct a set total uncertainty function, wherein the mathematical expression of the set total uncertainty function is as follows: ; Wherein, the In order to aggregate the uncertainty function, As a first state variable, a second state variable, As a second state variable, the first state variable, In order to perform the transposition of the network weights, Is a vector of the basis functions of the Actor, For the purpose of estimating the parameters for the disturbance, For a point-to-point multiplication, As a function of the switch, Is the value of the sliding mode function.
10. The method of optimizing adaptive control of a robot as set forth in claim 9, wherein the step S4 further comprises the sub-steps of: Step S401, calculating an output control moment based on the lumped uncertainty function and the sliding mode function value, wherein the mathematical expression of the output control moment is as follows: ; Wherein, the In order to output the control moment, In order to aggregate the uncertainty function, To expect the first derivative of the trajectory vector, Is a preset positive fixed approach law gain parameter, The sliding mode approach law item; And assigning the output control moment to the ideal control moment, calculating to obtain the actual output moment of the actuator, and controlling the manipulator through the actual output moment of the actuator.

Description

Self-adaptive control optimization method for manipulator Technical Field The invention relates to the technical field of intelligent control, in particular to a self-adaptive control optimization method of a manipulator. Background In the field of industrial automation, a manipulator is used as core equipment for executing complex operation tasks, and is widely applied to chip assembly, precision machining and carrying scenes. With the development of the operation environment towards unstructured and strong interaction, the performance optimization of the manipulator control system becomes a key for improving the production efficiency and the product quality. Most of the existing manipulator control systems adopt a preset control strategy to drive and operate, however, in the actual operation process, the existing manipulator control systems are limited by the physical structural characteristics and changeable external environment factors of the manipulator, and the preset static control strategy is difficult to adapt to complex dynamic working conditions, so that the operation effect cannot reach ideal expectations. At present, china patent with the application number 202511073167.5 discloses an optimization method and a system of a manipulator control system, wherein a historical operation data set is acquired, relevant motion characteristics are extracted, a pre-training model is utilized to generate strategy adjustment parameters to update a control instruction sequence, however, the prior art has the defects that the prior art is difficult to accurately represent a complex physical coupling relation inside a flexible driver, due to the lack of a high-dimensional characteristic mapping mechanism with time delay memory characteristics, the system is difficult to fully mine deep dynamics characteristics in a primary sensing signal when facing strong interaction interference, the prior art belongs to offline or quasi-online strategy adjustment, when an actuator has abrupt faults such as efficiency loss or bias, the prior art lacks an instantaneous and continuous online compensation mechanism, so that control moment generates large buffeting at the moment when the faults occur, the operation precision and the equipment safety are seriously affected, the prior art lacks real-time approximation capability of total uncertainty of the system set, an execution mechanism is difficult to correct control gain in real time according to environment change under the condition of lacking continuous guidance of an online evaluation network, the prior art control instruction is difficult to fully mine deep strategy self-evolution under complex interaction interference, tracking error is difficult to be ensured to reach steady state in a limited time, and the condition that a conventional control frame is easy to be controlled to achieve under the condition of high-precision convergence control precision, the mechanical requirements are limited when the precision is required to be applied to a mechanical frame. Disclosure of Invention The invention solves the technical problems that in the prior art, when unstructured dynamic environment and actuator abrupt fault coexist, the response is lagged due to insufficient extraction of sensing signal characteristics, a fault compensation mechanism depends on discrete classification, and the traditional neural network lacks time sequence memory capability, so that real-time and accurate correction of control gain is difficult to realize under strong interaction interference, and the problems that the system generates severe buffeting of control moment, tracking error cannot be converged in limited time and control singularity are solved, and the severe requirements of complex flexible operation on system robustness and steady-state precision cannot be met. In order to solve the technical problems, the invention provides a self-adaptive control optimization method of a manipulator, which comprises the following steps: step S1, acquiring a real-time sequence signal flow of driving pressure and current by using a manipulator body sensing network, converting the real-time sequence signal flow into a continuous time sequence feature matrix, constructing an equivalent rigid body dynamics model by physical structure data, and defining a state variable and an error variable; s2, inputting the continuous time sequence feature matrix into an echo state network, mapping and outputting a reserve pool state vector through a reserve pool structure, and defining the reserve pool state vector as a feature basis function vector; step S3, calculating Belman errors based on the error variables, constructing a reinforcement learning evaluation network by utilizing the characteristic basis function vector, calculating weight estimation values and disturbance estimation parameters of the reinforcement learning evaluation network, and obtaining a lumped uncertainty function based on the weight estimati