CN-121979164-A - Group vision robot quality inspection method based on multi-agent reinforcement learning
Abstract
The invention relates to the technical field of cooperative control of industrial machine vision and group robots, in particular to a group vision robot quality inspection method based on multi-agent reinforcement learning, which comprises the steps of collecting images of a workpiece to be inspected in a preset sampling period, and writing a robot pose identifier and a camera imaging parameter identifier into each frame of images to form a quality inspection observation set; performing N times of forward reasoning on the quality inspection observation set under a defect detection network containing a random inactivation layer, generating an uncertainty predicted value according to a defect confidence variance, calculating a risk scoring matrix to determine a risk region set, inputting the risk scoring matrix, the risk region set and a robot pose mark into a multi-agent reinforcement learning model to output a candidate observation action set, collecting a supplementary image according to the candidate observation action set, weighting and voting according to the uncertainty predicted value to update the risk scoring matrix, writing the risk region meeting a threshold into a rechecking task set, and rolling to be incorporated into the next sampling period.
Inventors
- YANG FAN
- CAI XINHAO
- DUAN WEIZHI
- WANG ZHIQIANG
- YAN ZHONGWEN
- LIU MENGTING
Assignees
- 小视科技(江苏)股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260408
Claims (10)
- 1. A group vision robot quality inspection method based on multi-agent reinforcement learning is characterized by comprising the following steps: Under a preset sampling period, a plurality of group vision robots acquire images of a workpiece to be detected through an airborne industrial camera, and a robot pose identifier and a camera imaging parameter identifier are written into each frame of images to obtain a quality inspection observation set; inputting the quality inspection observation set into a defect detection network containing a random inactivation layer for N times of forward reasoning, generating an uncertainty prediction value through variance output by N times of defect confidence coefficient, and calculating a risk scoring matrix based on the defect confidence coefficient and the uncertainty prediction value to determine a risk region set; Inputting the risk scoring matrix, the risk region set and the robot pose identification into a multi-agent reinforcement learning model which is intensively trained and distributed and executed, and outputting a candidate observation action set under a risk constraint threshold and a high confidence conflict penalty constraint; And controlling the group vision robots to acquire supplementary images according to the candidate observation action set, adopting a weighted voting strategy taking an uncertainty predictive value as a weight to fuse and update the risk scoring matrix, writing a risk region with the updated risk score being greater than or equal to a first risk threshold and the uncertainty predictive value being greater than or equal to the first uncertainty threshold into a rechecking task set, and incorporating the risk region into the candidate observation action set of the next sampling period.
- 2. The multi-agent reinforcement learning-based group vision robot quality inspection method of claim 1, wherein obtaining the quality inspection observation set comprises: Starting at each preset sampling period, generating a sampling period index by each group of vision robots, and writing the sampling period index into recording head information corresponding to the workpiece image to be detected acquired in the period; Triggering the airborne industrial cameras to expose and collect in a time window corresponding to the sampling period index by each group of vision robots to obtain workpiece image frames, and reading current pose parameters by a robot motion controller under the same sampling period index to generate a robot pose mark; Reading camera imaging parameters of an on-board industrial camera by each group of vision robots under the same sampling period index to generate a camera imaging parameter identifier, and binding the camera imaging parameter identifier and the robot pose identifier with the workpiece image frame according to a frame number; And performing consistency check on the four-element records of the workpiece image frame, the robot pose identifier, the camera imaging parameter identifier and the sampling period index, and merging according to the sampling period index to obtain the quality inspection observation set.
- 3. The multi-agent reinforcement learning-based group vision robot quality inspection method according to claim 2, wherein the consistency check method comprises the following steps: Reading the sampling period index for each quaternary record, checking whether the exposure time stamp of the workpiece image frame falls into a time window corresponding to the sampling period index, and marking the non-falling object as a cross-period abnormal record; Checking whether the time difference between the sampling time stamp of the robot pose mark and the exposure time stamp of the workpiece image frame is smaller than a preset time difference threshold value or not, and marking the pose mark larger than or equal to the preset time difference threshold value as a pose mismatch record; Checking whether an effective interval of the camera imaging parameter identifier covers an exposure time stamp of the workpiece image frame or not, marking an uncovered person as a parameter mismatch record, and outputting a consistency check result formed by the cross-period abnormal record, the pose mismatch record and the parameter mismatch record.
- 4. The multi-agent reinforcement learning-based group vision robot quality inspection method according to claim 1 or 2, wherein N forward reasoning is performed on the quality inspection observation set, and uncertainty prediction values are generated by variance of N defect confidence outputs, comprising: normalizing and cutting the workpiece image frames in the quality inspection observation set according to a preset input size to obtain an image input tensor set, and sending the image input tensor set into a defect detection network containing a random inactivation layer frame by frame; Enabling the random inactivation layer under the condition of keeping the weight parameters of the defect detection network unchanged for the image input tensor in each image input tensor set, and indexing according to the reasoning times Sequentially executing forward computation for N times to obtain N corresponding defect output tensor sets; For each candidate defect target in each group of defect output tensor set, taking Softmax output of a classification branch of each candidate defect target as defect confidence coefficient, and forming a defect confidence coefficient sequence by the defect confidence coefficient of the same candidate defect target under N times of forward calculation according to an inference frequency index; And calculating sample variance for the defect confidence sequence to obtain uncertainty predicted values corresponding to the candidate defect targets, and summarizing the uncertainty predicted values of the candidate defect targets in the same workpiece image frame into an uncertainty predicted value set of the frame.
- 5. The multi-agent reinforcement learning based group vision robot quality inspection method of claim 4, wherein calculating a risk scoring matrix based on the defect confidence and the uncertainty prediction value to determine a set of risk regions comprises: Dividing each frame of the workpiece image frame into a regional unit set according to a preset grid size, and associating the defect confidence coefficient and the uncertainty prediction value of each candidate defect target in the frame to a regional unit index covered by the frame; Calculating risk scores of the area units according to preset weight coefficients for each area unit index, wherein the risk scores consist of weighted sums of defect confidence degrees and uncertainty prediction values of corresponding candidate defect targets and product items, and the risk scores of the area units are arranged according to spatial indexes to obtain a risk score matrix; and selecting area unit indexes with risk scores greater than or equal to a first risk threshold value from the risk score matrix, and merging adjacent area unit indexes according to a preset communication rule to obtain a risk area set.
- 6. The multi-agent reinforcement learning-based group vision robot quality inspection method according to claim 5, wherein the preset connection rule comprises the steps of judging adjacent relations by adopting four-neighborhood connection based on two-dimensional grid coordinates of the area unit indexes, classifying the two area unit indexes into the same connection component when the two area unit indexes meet the adjacent relations and risk scores of the two area unit indexes are larger than or equal to the first risk threshold, calculating the number of area units of each connection component, comparing the number of the area units with a preset minimum number threshold, and determining the connection component with the number of the area units larger than or equal to the preset minimum number threshold as a risk area in the risk area set.
- 7. The multi-agent reinforcement learning based group vision robot quality inspection method of claim 5, wherein outputting the candidate set of observation actions comprises: Coding the risk scoring matrix and the risk area set into a global risk state vector according to a spatial index under each preset sampling period, and splicing the robot pose identifiers of all the group vision robots to the global risk state vector to obtain a joint state input vector; inputting the joint state input vector into a multi-agent reinforcement learning model which is intensively trained and distributed to be executed, obtaining action scoring sequences corresponding to all groups of vision robots, and generating an initial candidate observation action set containing observation pose increment and camera imaging parameter adjustment quantity by the action scoring sequences; Calculating coverage increment of the risk area set item by item for the initial candidate observation action set, and eliminating candidate observation actions with the coverage increment smaller than the risk constraint threshold value from the initial candidate observation action set to obtain a constraint candidate observation action set; Calculating high confidence conflict penalties for multiple robot candidate observation actions related to the same risk area in the constraint candidate observation action set based on defect confidence degrees corresponding to the risk areas, wherein the high confidence conflict penalties are triggered by the defect confidence degrees being larger than or equal to a preset high confidence threshold value and the defect confidence degree difference values of different robots being larger than or equal to a preset conflict difference value threshold value; And calculating an action priority score for each candidate observation action in the constraint candidate observation action set, wherein the action priority score is obtained by weighting the coverage increment and the high confidence conflict penalty, selecting the candidate observation action corresponding to the maximum value for each group of vision robots based on the action priority score, and summarizing to obtain the candidate observation action set.
- 8. The multi-agent reinforcement learning based group vision robot quality inspection method of claim 7, wherein controlling the plurality of group vision robots to collect supplemental images according to the set of candidate observation actions comprises: Splitting the candidate observation action set according to the group vision robot identifications to obtain target candidate observation actions corresponding to the group vision robots, and analyzing the target candidate observation actions into observation pose increment and camera imaging parameter adjustment quantity; generating a motion track point sequence according to the observed pose increment, issuing the motion track point sequence to a motion controller, updating camera imaging parameters of an airborne industrial camera according to the camera imaging parameter adjustment quantity, and generating an updated camera imaging parameter identifier; and triggering an airborne industrial camera to acquire a supplementary image after each group of vision robots reach the ending pose of the motion trail point sequence, and writing the pose mark of the robot and the imaging parameter mark of the camera into the supplementary image to obtain a supplementary quality inspection observation set.
- 9. The multi-agent reinforcement learning based group vision robotic quality inspection method of claim 8, wherein updating the risk scoring matrix through the weighted voting strategy fusion comprises: merging the supplementary quality inspection observation set into the quality inspection observation set according to the sampling period index, inputting each frame of merged image into a defect detection network containing a random inactivation layer for N times, performing forward reasoning to generate a corresponding defect confidence coefficient and an uncertainty prediction value, and associating the defect confidence coefficient and the uncertainty prediction value to a region unit index corresponding to the risk region set; Collecting a plurality of defect confidence coefficients falling into the risk region in the risk region set, and taking the inverse of the sum of an uncertainty predicted value corresponding to each defect confidence coefficient and a preset constant as a voting weight to obtain a weighted voting weight set; And carrying out weighted summation on the defect confidence coefficient based on the weighted voting weight set to obtain a fusion confidence coefficient, and carrying out combined calculation on the fusion confidence coefficient and an uncertainty predicted value of the risk area according to a preset weight coefficient to obtain an updated risk score, so as to replace risk scores of corresponding area unit indexes in the risk score matrix to obtain an updated risk score matrix.
- 10. The multi-agent reinforcement learning-based group vision robot quality inspection method of claim 9, further comprising: Comparing the updated risk score with a first risk threshold, comparing the uncertainty predicted value of each risk region in the updated risk score matrix with the first uncertainty threshold, and outputting a judging mark according to the comparison result, wherein the first judging mark is output and the risk region is written into the rechecking task set when the updated risk score is larger than or equal to the first risk threshold and the uncertainty predicted value is larger than or equal to the first uncertainty threshold, the second judging mark is output and the risk region is removed from the rechecking task set when the updated risk score is larger than or equal to the first risk threshold and the uncertainty predicted value is smaller than the first uncertainty threshold, the third judging mark is output and the risk region is written into the candidate supplementary observation region set when the updated risk score is smaller than the first risk threshold and the uncertainty predicted value is smaller than the first uncertainty threshold, and the risk region is simultaneously removed from the rechecking task set and the candidate supplementary observation region set when the updated risk score is smaller than the first risk threshold and the uncertainty predicted value is smaller than the first uncertainty threshold; And merging the rechecking task set and the candidate supplementary observation region set to obtain a target risk region set of the next sampling period, and inputting the target risk region set and the robot pose identification together into a multi-agent reinforcement learning model for centralized training and distribution execution so as to output a candidate observation action set of the next sampling period under the constraint of a risk constraint threshold and high confidence conflict penalty.
Description
Group vision robot quality inspection method based on multi-agent reinforcement learning Technical Field The invention relates to the technical field of cooperative control of industrial machine vision and group robots, in particular to a group vision robot quality inspection method based on multi-agent reinforcement learning. Background With the advancement of intelligent manufacturing and flexible production lines, the machine vision quality inspection facing appearance defects, assembly deviation and surface defects gradually changes from a single-camera and fixed station offline sampling inspection form to a multi-camera/multi-robot and online closed-loop process quality control form, particularly under the scenes of multiple varieties of small batches, complex reflective materials, shielding structural members and the like, a quality inspection system is required to complete effective coverage of a high-risk area in a limited beat, judgment consistency and traceability are required to be maintained under the conditions of multiple visual angles, multiple exposure and multiple motion postures, the existing method is mainly characterized in that the confidence level output by a depth detection network is used as a recheck basis, or repeated shooting is carried out through a fixed inspection route to compensate for missing inspection risks, but the strategies often do not establish a computable coupling relation between imaging uncertainty and robot observation actions, on one hand, the confidence level is not equal to reliability, high confidence misjudgment is not rare under the conditions of complex textures, strong reflective materials and domain deviation, on the other hand, full repeated detection can remarkably increase acquisition and reasoning load, cause fluctuation of the production line and resource blockage, and the coincidence of repeated coverage and repeated inspection with the resources of multiple robots are difficult to improve the critical detection and missing inspection rate. US10713769B2 discloses an active learning framework around defect classifier training, which reduces training data acquisition cost and improves classifier iteration efficiency by calculating uncertainty on sample data points generated by an imaging subsystem and selecting data points with more information to enter a labeling and training process according to the uncertainty, but focuses on data selection and labeling closed loops in a training stage, does not map uncertainty quantification results to executable observation action distribution further, and does not relate to risk area organization and online review scheduling under multi-robot collaborative coverage. US20190294149A1 discloses a method and a system for estimating the reliability/uncertainty of the output decision of a supervised learner, and establishes an association between the uncertainty and the exploration behavior in an autonomous platform control task to improve the reliability of control output, but the method is not directed to multi-view evidence fusion and risk area rolling review of industrial defect quality inspection, and lacks a risk scoring matrix, a risk area set, the structuring of candidate observation action sets and a constraint mechanism for high-confidence conflict of multiple robots. In view of the problems that the reliability is difficult to characterize due to the confidence coefficient commonly existing in the existing group vision quality inspection technology, the beat and calculation power cost are increased due to repeated acquisition, and the action scheduling and consistency inhibition for a high risk area are lacking when multiple robots cooperate, the invention provides a group vision robot quality inspection method based on multi-agent reinforcement learning, which comprises the steps of writing robot pose identifiers and camera imaging parameter identifiers into workpiece images in a preset sampling period to form a quality inspection observation set, further carrying out N times of forward reasoning by using a defect detection network comprising a random inactivation layer and generating an uncertainty predicted value by using a defect confidence coefficient variance, and constructing a risk scoring matrix by combining the defect confidence and the uncertainty predicted value, determining a risk region set, inputting the risk scoring matrix, the risk region set and the robot pose identification into a multi-agent reinforcement learning model which is intensively trained and distributed and executed, outputting a candidate observation action set under the constraint of a risk constraint threshold and a high confidence conflict penalty, weighting, voting, fusion and updating the risk scoring matrix by using the uncertainty predicted value, writing the risk region meeting the threshold condition into a rechecking task set, and rolling and incorporating the risk region into the candidate observation action set o