CN-122018293-A - Fresh-keeping cutting robot motion trail optimization method based on reinforcement learning driving

CN122018293ACN 122018293 ACN122018293 ACN 122018293ACN-122018293-A

Abstract

The invention relates to the field of process control, in particular to a fresh cutting robot motion trail optimization method based on reinforcement learning driving. The method comprises the steps of collecting fresh geometric information by using a three-dimensional sensor, extracting three-dimensional space coordinate points, constructing fresh form functions through an implicit neural representation network, extracting form parameters through a self-encoder, generating candidate cutting sequences through a self-adaptive strategy network, simulating the candidate cutting sequences through a forward dynamic model to obtain track evaluation values, optimizing the self-adaptive strategy network through a soft actor-critique algorithm to obtain a strategy network, obtaining motion numerical control instructions through the strategy network, obtaining tracking error signals through a servo control system, generating joint driving signals through a PID control algorithm, and driving an end effector to complete cutting motions. The invention utilizes forward dynamic model simulation and soft actor-criticizer algorithm to realize higher-efficiency automatic cutting when facing non-standard fresh products.

Inventors

XUE GUANGMING
XU BIN
Wei Zhue
LU ZHIXIANG
LI ZHUZHU
TAN CHUNPING
LIANG GUAN
WEI CHUANGXIN
Lin Funing

Assignees

南宁学院
广西邕之泰实业有限公司

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (9)

1. The fresh-keeping cutting robot motion trail optimization method based on reinforcement learning driving is characterized by comprising the following steps of: the method comprises the steps of collecting fresh geometric information from a fresh cutting robot end effector coordinate system by using a three-dimensional sensor, extracting three-dimensional space coordinate points based on the fresh geometric information, constructing fresh form functions through an implicit neural representation network, carrying out dimension reduction compression on the three-dimensional space coordinate points through a self-encoder, and extracting form parameters; Generating a candidate cutting sequence through a self-adaptive strategy network based on morphological parameters, simulating the candidate cutting sequence through a forward dynamic model based on fresh morphological functions to obtain an expected cutting result, outputting a track evaluation value through a preset performance index based on the expected cutting result, storing the morphological parameters, the candidate cutting sequence and the track evaluation value into an experience playback pool, sampling an experience sample from the experience playback pool through a soft actor-critique algorithm, and performing iterative optimization on the self-adaptive strategy network to obtain a strategy network; And comparing the motion numerical control instruction with an actual measured value through a servo control system to obtain a tracking error signal, generating a joint driving signal through a PID control algorithm, and driving the end effector to complete cutting motion.
2. The method for optimizing the motion trail of the fresh cutting robot based on reinforcement learning driving is characterized in that the specific generation process of the fresh form function comprises the steps of scanning fresh products in an end effector coordinate system by using a three-dimensional sensor to obtain fresh geometric information, performing digital filtering operation on the fresh geometric information, removing noise points and background points, extracting three-dimensional space coordinate points, inputting the three-dimensional space coordinate points to an encoder part of an implicit neural representation network, generating form feature vectors through feature extraction operation, splicing the three-dimensional space coordinate points and the form feature vectors, inputting the three-dimensional space coordinate points to a decoder part of the implicit neural representation network, outputting scalar symbol distance values to form a fresh form function, and forming a continuous three-dimensional surface of the fresh products by a zero level set of the fresh form function.
3. The fresh cutting robot motion trail optimization method based on reinforcement learning driving is characterized in that the specific framework of the implicit neural representation network comprises an encoder part and a decoder part, wherein the encoder part adopts a multi-layer perceptron structure, performs point-by-point feature extraction on three-dimensional space coordinate points through a full-connection layer and a nonlinear activation function, maps the three-dimensional space coordinate points into high-dimensional form feature vectors, splices the high-dimensional form feature vectors with the three-dimensional space coordinate points, inputs the decoder part, performs information fusion and space mapping through a full-connection layer and a nonlinear activation function, predicts scalar symbol distance values through linear output layer regression, and constructs the fresh form function.
4. The fresh cutting robot motion trail optimization method based on reinforcement learning driving according to claim 1 is characterized in that the specific generation process of morphological parameters comprises the steps of inputting three-dimensional space coordinate points to the self-encoder, extracting point-by-point characteristics of the three-dimensional space coordinate points through a shared multi-layer perceptron based on a point cloud processing architecture, generating global morphological characteristics through convergence of a symmetrical aggregation function, outputting a mean vector and a standard deviation vector based on the global morphological characteristics, sampling through a re-parameterization method based on Gaussian distribution parameterized by the mean vector and the standard deviation vector, and generating low-dimensional potential vectors as the morphological parameters.
5. The fresh cutting robot motion trail optimization method based on reinforcement learning driving according to claim 1 is characterized in that the specific generation process of the candidate cutting sequence comprises the steps of inputting the morphological parameters into an adaptive strategy network to generate a mean vector and a logarithmic standard deviation vector, generating an initial program motion vector through re-parameterization calculation based on the mean vector, the logarithmic standard deviation vector and a noise vector obtained by sampling from standard normal distribution, generating a program motion based on the initial program motion vector through compression processing of a hyperbolic tangent function, and iteratively converging the program motion to form the candidate cutting sequence.
6. The reinforcement learning driving-based fresh cutting robot motion trail optimization method according to claim 1 is characterized in that the specific generation process of trail evaluation values comprises the steps of inputting the candidate cutting sequence and the fresh form function together into a forward dynamic model for forward prediction, outputting a three-dimensional form as the expected cutting result, inputting the expected cutting result into a preset performance index function, calculating a chamfering distance between the expected cutting result and a target template form, and outputting the trail evaluation values by combining with collision penalty generated when the candidate cutting sequence collides with a preset obstacle.
7. The reinforcement learning driving-based fresh cutting robot motion trail optimization method is characterized in that the specific process of the forward dynamic model comprises the steps of performing off-line simulation on the candidate cutting sequence and the fresh form function by adopting a unit deletion technology based on a finite element method and a damage mechanical model to generate a training data set containing simulation results, training a deep neural network based on the training data set, and rapidly predicting a corresponding new three-dimensional form, wherein the deep neural network is the forward dynamic model.
8. The fresh cutting robot motion trail optimization method based on reinforcement learning driving is characterized in that the specific generation process of the motion numerical control instruction comprises the steps of taking the morphological parameters and the candidate cutting sequence as input, performing forward propagation calculation through an evaluator network in a soft actor-commentator algorithm to generate a soft Q value, taking the morphological parameters as input, performing forward propagation calculation through the self-adaptive strategy network to obtain strategy entropy, combining the soft Q value and the strategy entropy to generate a parameter update gradient, updating network parameters of the self-adaptive strategy network based on the parameter update gradient, iteratively executing the update process until performance indexes of the self-adaptive strategy network converge to obtain the strategy network, inputting the morphological parameters into the strategy network, iteratively generating a program motion vector through an autoregressive mode, converging to form a desired track sequence, and generating the motion numerical control instruction through inverse kinematics calculation and interpolation operation based on the desired track sequence.
9. The method for optimizing the motion trail of the fresh cutting robot based on reinforcement learning driving according to claim 1 is characterized in that the specific process of generating a joint driving signal through a PID control algorithm comprises the steps of taking a motion numerical control instruction as an expected trail, obtaining an actual measurement value of an end effector, generating a tracking error signal through comparing the expected trail with the actual measurement value, generating a proportional control item through proportional amplification of the tracking error signal through a proportional link, eliminating steady-state errors through integral operation of the tracking error signal, generating an integral control item, differentiating the tracking error signal through an integral link, predicting error change trend, generating a differential control item, linearly superposing the proportional control item, the integral control item and the differential control item, generating a servo driving control quantity as the joint driving signal, and driving the end effector to complete cutting motion through the joint driving signal.

Description

Fresh-keeping cutting robot motion trail optimization method based on reinforcement learning driving Technical Field The invention relates to the field of process control, in particular to a fresh cutting robot motion trail optimization method based on reinforcement learning driving. Background In an automatic control system, track control and optimization are performed on the position, the posture and the speed of a robot end effector, and the robot end effector is a core technology for realizing high-precision and high-efficiency operation. Existing control methods commonly employ preprogrammed position control modes, such as teaching reproduction or off-line programming based on CAD/CAM models. The method presets a static geometric path without time domain information or dynamic constraint for a control system, and a servo system performs tracking control only according to a position instruction of the path, which is a mainstream position control and track generation scheme in the current industrial robot field. However, the above-described position control scheme has significant technical drawbacks when applied to complex control tasks with cutting flexible objects. The generation of the control strategy is only based on a kinematic model, and the dynamics characteristics of the controlled system and the contact mechanics problem during interaction with the external environment are ignored. The open-loop control system based on the static path lacks self-adaptive adjustment capability on contact force and deformation due to the fact that fresh products have nonlinear physical characteristics such as flexibility and variability, so that the control system is difficult to balance between the operation quality and the lifting running speed, and an optimal motion track which can adapt to the physical characteristics of a flexible and nonstandard cutting object cannot be planned by combining dynamic constraint of a fresh cutting robot, so that the cutting quality cannot be ensured, and the optimal balance among the cutting efficiency, the motion stability and the raw material yield cannot be realized. Therefore, a fresh-keeping cutting robot motion trail optimization method based on reinforcement learning driving is provided. Disclosure of Invention The invention aims to provide a fresh cutting robot motion trail optimization method based on reinforcement learning driving so as to solve the problems in the background technology. In order to achieve the purpose, the invention provides the following technical scheme that the fresh-keeping cutting robot motion trail optimization method based on reinforcement learning driving comprises the following steps: the method comprises the steps of collecting fresh geometric information from a fresh cutting robot end effector coordinate system by using a three-dimensional sensor, extracting three-dimensional space coordinate points based on the fresh geometric information, constructing fresh form functions through an implicit neural representation network, carrying out dimension reduction compression on the three-dimensional space coordinate points through a self-encoder, and extracting form parameters; Generating a candidate cutting sequence through a self-adaptive strategy network based on morphological parameters, simulating the candidate cutting sequence through a forward dynamic model based on fresh morphological functions to obtain an expected cutting result, outputting a track evaluation value through a preset performance index based on the expected cutting result, storing the morphological parameters, the candidate cutting sequence and the track evaluation value into an experience playback pool, sampling an experience sample from the experience playback pool through a soft actor-critique algorithm, and performing iterative optimization on the self-adaptive strategy network to obtain a strategy network; And comparing the motion numerical control instruction with an actual measured value through a servo control system to obtain a tracking error signal, generating a joint driving signal through a PID control algorithm, and driving the end effector to complete cutting motion. Preferably, the specific generation process of the fresh form function comprises the following steps: In an end effector coordinate system, a three-dimensional sensor is used for scanning fresh products to obtain fresh geometric information, digital filtering operation is performed on the fresh geometric information, noise points and background points are removed, three-dimensional space coordinate points are extracted, the three-dimensional space coordinate points are input to an encoder part of the implicit neural representation network, morphological feature vectors are generated through feature extraction operation, the three-dimensional space coordinate points and the morphological feature vectors are spliced, the three-dimensional space coordinate points and the morphological feature vecto