CN-122018294-A - Collaborative reinforcement learning control method and system for automobile spraying task

CN122018294ACN 122018294 ACN122018294 ACN 122018294ACN-122018294-A

Abstract

The invention discloses a collaborative reinforcement learning control method and a collaborative reinforcement learning control system for an automobile spraying task, which relate to the technical field of automatic spraying and comprise the steps of obtaining image data of multiple visual angles of a vehicle to be sprayed and preprocessing the image data to obtain enhanced image data, three-dimensionally reconstructing the image data to obtain a three-dimensional grid model, collecting paint film thickness data in the spraying process in real time to generate a dynamic paint film thickness map, converting a multi-arm collaborative spraying task into a multi-agent Markov decision model based on the three-dimensional grid model and the dynamic paint film thickness map, outputting a control instruction, executing the control instruction by an agent strategy network to obtain local observation information and generate initial actions, inputting the initial actions into a centralized intelligent network to obtain global value based on all the local observation information and the initial actions, and updating parameters of the strategy network by taking the global value as a target to guide the agent to generate the current optimal actions to execute the corresponding spraying task. Improving the quality of the coating and prolonging the service life of the equipment.

Inventors

TENG SHAOHUA
LIU YAN
ZHENG ZEFENG
TENG LUYAO
ZHANG WEI

Assignees

广东工业大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The cooperative reinforcement learning control method for the automobile spraying task is characterized by comprising the following steps of: Acquiring and preprocessing image data of a plurality of view angles of a vehicle to be sprayed to obtain enhanced image data; performing three-dimensional reconstruction based on the enhanced image data to obtain a three-dimensional grid model of the vehicle to be sprayed; Collecting paint film thickness data in the spraying process in real time and combining the three-dimensional grid model to generate a dynamic paint film thickness map; converting the multi-arm collaborative spraying task into a multi-agent Markov decision model based on the three-dimensional grid model and the dynamic paint film thickness map, and outputting a control instruction; Executing the control instruction based on the strategy network of all the intelligent agents, acquiring local observation information, and generating an initial action; inputting the local observation information and the initial actions to a centralized intelligent network based on all the local observation information and the initial actions to obtain global value; And updating parameters of the strategy network based on the global value maximization target, guiding the intelligent body to generate a current optimal action, and executing a corresponding spraying task.
2. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, wherein the enhanced image data is obtained by: Collecting a plurality of groups of overlapped high-resolution image data from a plurality of preset or dynamically planned view angles around the vehicle to be sprayed; Performing image scaling, normalization and channel normalization operations based on the image data to obtain preprocessed image data; and carrying out data enhancement based on the preprocessed image data to obtain the enhanced image data.
3. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, wherein the three-dimensional mesh model construction method is as follows: Identifying and matching key feature points among different views by adopting an SfM algorithm based on the enhanced image data; performing triangulation based on the geometric relationship of the matched key feature points to obtain six-degree-of-freedom pose of the camera when each image is captured, wherein the pose is used as a camera calibration parameter; performing dense and pixel-level matching between overlapped images by adopting an MVS algorithm based on the camera calibration parameters to generate dense point clouds; and fusing the dense point clouds generated based on each view angle to obtain a continuous integral model, and fitting a continuous smooth surface by adopting a poisson surface reconstruction algorithm to obtain the three-dimensional grid model.
4. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, wherein the multi-agent markov decision model acquisition method is as follows: Obtaining a joint state space based on the three-dimensional grid model, the dynamic paint film thickness map and the motion state of the mechanical arm intelligent body; Based on the joint action set adopted by all the mechanical arm intelligent bodies, the mechanical arm intelligent bodies are used as joint action spaces; Based on probability distribution that the system is transferred to the next state after all the mechanical arm intelligent agents execute joint actions in the current state, the probability distribution is used as a state transfer function; setting a reward function set of each mechanical arm intelligent body; based on the joint state space S and the joint action space The state transfer function P and the set of reward functions R constitute the multi-agent markov decision model: ; Where γ represents the discount factor and n represents the total number of robot agents in the system that make independent decisions.
5. The collaborative reinforcement learning control method for an automotive spray mission of claim 4, wherein a set of reward functions based on each of the robotic agents is summed as a team reward function; all of the robotic arm agents share the same team rewards function R total : ; Wherein omega 1 、ω 2 、ω 3 、ω 4 and omega 5 each represent a corresponding weight coefficient, The variance of the dynamic paint film thickness map is shown, Representing the new surface area covered by the coating, n s representing the number of primary spray arms, n f representing the number of fine spray arms, W i representing the cumulative workload of the individual spray arm agents, Indicating the average load of all spray arm agents, Indicating a security benefit that is to be awarded, Indicating the amount of paint that is wasted without adhering to the vehicle body.
6. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, wherein the local observation information o i is: ; Wherein r i denotes a joint angle vector, v i denotes a joint velocity vector, p i denotes a terminal nozzle pose, I i denotes a partial image segment, n i denotes a spray surface normal vector, d i denotes a nozzle-to-target point vector, c i denotes a partial spray coverage value, M thick denotes a dynamic paint film thickness map, Representing a relative position vector with other robot agents, Representing the current task phase, t representing a time stamp; The distributed intelligent agent network is realized by adopting a mixed depth network architecture, wherein characteristics of vector and scalar data in the local observation information are extracted by adopting a multi-layer perceptron to obtain multi-source characteristics, and characteristics of image data and image data in the local observation information are extracted by adopting a convolutional neural network to obtain spatial characteristics; and inputting the spliced multi-source features and the spliced spatial features into a multi-layer perceptron to obtain the initial action.
7. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, wherein the global value acquisition method is: based on all the local observation information, global state information is generated; Generating a joint action based on all the initial actions; And inputting the global state information and the joint action to the centralized intelligent network to obtain global value for evaluating the advantages and disadvantages of the current joint action.
8. The collaborative reinforcement learning control method for an automotive spray mission of claim 7, wherein the centralized intelligent network comprises a multi-layer perceptron and a motion-cost function network; Extracting the global state information and the characteristics of the combined action based on the multi-layer perceptron, inputting the extracted characteristics into the action-cost function network, and outputting the global value; the centralized intelligent network updates parameters by minimizing timing differential errors.
9. The collaborative reinforcement learning control method for an automotive spray mission of claim 1, further comprising: when the strategy network judges that the whole surface of the vehicle body of the vehicle to be sprayed reaches the target thickness and uniformity standard, automatically stopping the spraying process; And carrying out full-surface quality scanning on the sprayed vehicle, and generating a quality report.
10. A collaborative reinforcement learning control system for an automotive spray task, for executing a collaborative reinforcement learning control method for an automotive spray task according to any one of claims 1-9, comprising an image acquisition processing module, a vehicle three-dimensional reconstruction module, a paint film data acquisition module, a control instruction output module, an initial motion generation module, an initial motion evaluation module, and an optimal motion generation module; the image acquisition processing module is used for acquiring and preprocessing image data of a plurality of view angles of the vehicle to be sprayed to obtain enhanced image data; The vehicle three-dimensional reconstruction module is used for carrying out three-dimensional reconstruction based on the enhanced image data to obtain a three-dimensional grid model of the vehicle to be sprayed; The paint film data acquisition module is used for acquiring paint film thickness data in the spraying process in real time and combining the three-dimensional grid model to generate a dynamic paint film thickness graph; The control instruction output module is used for converting the multi-arm collaborative spraying task into a multi-agent Markov decision model based on the three-dimensional grid model and the dynamic paint film thickness graph and outputting a control instruction; The initial action generating module is used for executing the control instruction based on the strategy network of all the agents, acquiring local observation information and generating initial actions; The initial action evaluation module is used for inputting all the local observation information and the initial actions into a centralized intelligent network to obtain global value; and the optimal action generating module is used for updating parameters of the strategy network based on the global value maximization target, guiding the intelligent agent to generate the current optimal action and executing the corresponding spraying task.

Description

Collaborative reinforcement learning control method and system for automobile spraying task Technical Field The invention relates to the technical field of automatic spraying, in particular to a cooperative reinforcement learning control method and system for an automobile spraying task. Background Automobile spraying is a key process link in automobile manufacturing, and directly determines the appearance quality and the corrosion resistance of a vehicle. Techniques in this field have undergone an evolution from manual spraying to automated robotic spraying. Currently, the mainstream automatic spraying production line generally adopts a single-arm or double-arm six-axis industrial robot, and performs a pre-programmed spraying track in a 'teaching-reproduction' manner. In the mode, engineers need to manually guide the robot, record the motion path and the gesture of the spray gun, and technological parameters such as spraying flow, atomization pressure and the like point by point, and form a set of fixed programs aiming at specific vehicle types. In production, the robot repeatedly performs the procedure with high precision instead of manual work, thereby improving production efficiency and freeing workers from harmful environments filled with Volatile Organic Compounds (VOCs). However, despite the tremendous advances in existing automated spray coating technology over manual spray coating, the nature of its "open loop" control exposes the following core drawbacks in dealing with the ever-increasing quality and flexible production needs: 1. The programming dependence is strong, the production flexibility is poor, the prior art seriously depends on time-consuming and labor-consuming manual teaching programming, and whenever a new vehicle model is required to be introduced or an existing vehicle model is required to be changed, an experienced engineer needs to spend weeks or even months for reprogramming and debugging. The production line has slow response when the vehicle type is switched, and is difficult to meet the flexible production requirements of modern manufacturing industry on small-batch, multi-variety and personalized customization. 2. The quality of the complex curved surface coating is poor, and the automobile body extends over complex free curved surfaces and structural details, such as rearview mirrors, door handles, body gaps, waistlines and the like. Under the fixed procedure, the single-arm or double-arm robot is difficult to always maintain the optimal distance and vertical posture between the spray gun and the target surface in the areas, so that the problems of underspray, overspray and the like are often caused, the thickness of a paint film is uneven, and the appearance quality is seriously influenced. Especially at curved surface boundaries, fixed path planning algorithms often result in deteriorated coating uniformity by ignoring geometric boundaries. 3. The dynamic disturbance adaptability is weak, and defects frequently occur that spraying is a dynamic process influenced by multiple variables, and the final paint film quality can be influenced by temperature and humidity of workshop environment, air flow disturbance, batch difference of viscosity of paint and the like. The preset fixed track can not sense the real-time changes and make adjustments, so that the defects of orange peel, sagging, shrinkage cavity, fish eyes and the like still exist commonly. 4. Task allocation is stiff and device lifetime and efficiency are limited because in a two-arm collaborative system, tasks are typically statically allocated, which can easily lead to uneven workload between the two arms. Under long-term operation, some robots may accelerate wear of critical components (such as servo motors and harmonic reducers) due to excessive use, while others may be in a low-load state for a long time. This uneven wear results in large differences in Mean Time Between Failure (MTBF) for each arm, increasing the risk of unplanned downtime and maintenance costs, and shortening the overall useful life of the system. Therefore, how to realize the intelligent control of the automobile spraying process, optimize the cooperative efficiency, and further improve the coating quality and the equipment life is a problem to be solved by the technicians in the field. Disclosure of Invention In view of the above problems, the present invention provides a collaborative reinforcement learning control method and system for an automobile spraying task, which overcomes or at least partially solves the above problems, combines multi-view three-dimensional sensing, collaborative reinforcement learning and real-time closed loop feedback, achieves intelligent control of an automobile spraying process, optimizes collaborative efficiency, and further improves coating quality and equipment life. In order to achieve the above purpose, the present invention adopts the following technical scheme: In a first aspect, an embodiment of the pr