CN-122024010-A - Coaxial line joint detection method based on improved YOLO11 model

CN122024010ACN 122024010 ACN122024010 ACN 122024010ACN-122024010-A

Abstract

The invention discloses a coaxial line joint detection method based on an improved YOLO11 model, which comprises the steps of manufacturing a coaxial line joint target detection data set, inputting the coaxial line joint target detection data set into the improved YOLO11 model for training to obtain an optimal improved YOLO11 model with stable prediction precision as the coaxial line joint detection model, inputting the data to be detected into the obtained coaxial line joint detection model for accurately detecting and positioning the coaxial line joint, screening an effective detection frame through non-maximum suppression, removing a low-reliability prediction result in combination with a confidence threshold value, and finally outputting a detection image marked with joint position and confidence to finish the positioning and identification of the coaxial line joint. The invention solves the problems of scarce data set, difficult small target identification, strong reflection interference, limited model deployment and the like in coaxial line joint detection, combines detection precision and instantaneity, and is suitable for the 3C product automatic assembly line scene.

Inventors

TIAN LIANFANG
HUANG JIEKAI
DU QILIANG

Assignees

华南理工大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (6)

1. The coaxial line joint detection method based on the improved YOLO11 model is characterized in that the improved YOLO11 model is an improvement on a backbone network, a feature fusion network and a model structure of an original YOLO11 model to realize accurate detection and positioning of coaxial line joints, wherein the improvement on the backbone network is that a down-sampling module C2PSA of the backbone network is replaced by a high-efficiency channel attention module with dynamic channel self-adaption and small target feature strengthening capability, called EA for enhancing small target feature focusing capability, the improvement on the feature fusion network is that a lightweight edge enhancement module based on improvement of a depth separable convolution module is embedded in an up-sampling stage of the feature fusion network, called LEE for enhancing edge feature response, the improvement on the model structure is that a layer-by-layer channel optimization strategy which is distinguished by importance of each layer is adopted, the importance value of each layer is calculated respectively, the channel number of each layer is adjusted according to the importance value, and the model weight is realized on the premise of ensuring accuracy; the specific implementation of the coaxial line joint detection method comprises the following steps: 1) Acquiring coaxial line joint images under multiple equipment and multiple scenes, and preprocessing to obtain a coaxial line joint target detection data set with sufficient sample diversity; 2) In the training process, extracting multi-scale characteristics of the coaxial connector through a backbone network embedded with EA, firstly compressing and aggregating channel information on the input characteristic diagram, then fusing a dynamic receptive field module to adjust a characteristic capturing range, finally generating attention weight strengthening key characteristics, optimizing characteristic expression through a characteristic fusion network embedded with LEE to extract edge information, combining residual connection and an activation function to keep characteristic integrity, and then matching channel dimensions through point convolution; 3) Inputting the data to be detected into the coaxial line joint detection model obtained in the step 2) to accurately detect and position the coaxial line joint, screening an effective detection frame through non-maximum suppression, eliminating a low-confidence prediction result by combining a confidence coefficient threshold value, and finally outputting a detection image marked with joint position and confidence coefficient to finish the positioning and identification of the coaxial line joint.
2. The coaxial cable joint detection method based on the modified YOLO11 model of claim 1, wherein the step 1) comprises the steps of: The multi-device data acquisition comprises a core device, auxiliary devices, an acquisition scene, an industrial assembly line, a display device and a control device, wherein the core device is a D435i depth camera, the camera height is fixed during acquisition, the shooting angle covers a range of 0-45 degrees, and RGB images and depth information are synchronously acquired; Removing image noise through Gaussian filtering with convolution kernel size of 3 multiplied by 3, adjusting image contrast through self-adaptive histogram equalization, eliminating illumination difference influence through brightness normalization, and ensuring that the preprocessed image is clear and gray level distribution is uniform; marking a boundary frame of the coaxial line joint according to a YOLO format by using a LabelImg marking tool, wherein the marking rule is that the boundary frame needs to completely wrap the joint area; And (3) data enhancement, namely implementing various enhancement strategies aiming at the marked image, improving sample diversity and model generalization capability, and finally obtaining a coaxial line joint target detection data set with sufficient sample diversity.
3. The coaxial line joint detection method based on the modified YOLO11 model of claim 1, wherein the EA-embedded backbone network performs the following operations: First, the dimension for the input is Performs a global average pooling operation on feature maps of (1), wherein 、 For the height and width of the feature map, By providing for all spatial positions within each channel, as the total number of channels The pixel values of (c) are subjected to a mean value calculation, The values on the pixel coordinate axis x and the coordinate axis y respectively representing the image are originally distributed on Feature information compression aggregation of spatial dimensions into To avoid the problems of rapid increase of computation complexity and feature overfitting caused by excessive number of channels, compressing the number of channels of the one-dimensional channel descriptor to be equal to that of the one-dimensional channel descriptor by 1X 1 convolution operation The term "// denotes the integer division, The channel number compression design has the advantages of realizing reasonable simplification of the channel number by integer division, reducing subsequent calculation pressure, forcing the number of middle layer channels to be not lower than 8 by max function, ensuring that key characteristic information is not lost due to excessive dimension reduction in the channel compression process, guaranteeing the integrity of characteristic expression, then constructing a multi-stage progressive excitation mechanism consisting of an unbiased linear layer, a memory optimized ReLU (reference-order) activation function, an unbiased linear layer and a sigmoid activation function, wherein the unbiased linear layer is designed to avoid interference of bias parameters on channel dependence learning, the memory optimized ReLU activation function realizes in-situ calculation of a characteristic diagram by means of parameter setting inplace =true while keeping nonlinear characteristic extraction capability, greatly reduces memory occupation, and finally maps the output of the multi-stage progressive excitation mechanism to a [0,1] interval by the sigmoid activation function to generate channel attention weight capable of accurately reflecting the importance degree of each channel, and finally carrying out the generated channel attention weight and original input weight The feature map carries out pixel level multiplication operation on a channel-by-channel basis, so that the weighting enhancement of channels containing key features of the coaxial line joint is realized, and meanwhile, the channels containing redundant information such as background noise and irrelevant textures are restrained, so that the degree of distinguishing between a target and a background in the feature map is improved; The EA introduces a delayed full-connection layer initialization mechanism, that is, by setting a full-connection layer parameter self fc as None in a module initialization stage, an adaptive full-connection layer structure is dynamically created according to the actual channel number of an input feature map until a specific input feature map is received in a first forward propagation process, so that the full-connection layer structure can be integrated into variant networks of n, s, m, l, x different parameter scales of a YOLO11 model without carrying out module structure adjustment for different network variants; The mathematical formula corresponding to the whole operation process is as follows: ; ; ; ; ; In the formula, Is to input the position of the characteristic diagram Number of channels Is used for the display of the display panel, For the number of channels Is used to pool the results of the global averaging, Representing the feature vector of the global pooling result after the unbiased linear layer dimension reduction, Indicating that the linear layer is not biased, Representing the offset of the first and second electrodes, Representing the non-bias of the device, In order to have an offset in the direction of the bias, Representing the result of the global average pooling, The number of output channels of the linear layer after the dimension reduction is represented as a fixed value , The number of output channels representing the linear layer is the total number of channels of the input feature map , A ReLU activation function representing the optimization of the memory, As the attention weight of the channel, Is the first The attention weight of the individual channels, To be positioned at Number of channels The feature map is output after weight enhancement; Through the backbone network, 4 downsampling operations are sequentially completed in the feature extraction process, the downsampling rates are respectively 2, 4, 8 and 16, progressive extraction from shallow texture features to deep semantic features is realized by gradually improving the channel dimension of the feature map and reducing the spatial resolution, finally, output feature maps of four layers from shallow to deep are finally obtained and respectively marked as P3, P4, P5 and P6, wherein P3 has the highest spatial resolution, fine textures and edge details of the coaxial line connector can be accurately captured, P4 enhances semantic feature expression while retaining the spatial resolution, and the two cooperate with each other to specially optimize the feature extraction requirements of small targets such as the coaxial line connector.
4. The coaxial cable joint detection method based on the improved YOLO11 model according to claim 1, wherein the feature fusion network embedded with LEEs entirely uses the original structural framework of the original YOLO11 model to ensure the compatibility of the network structure and the basic detection performance; the core workflow of the LEE is divided into three closely connected technical links, and the high-efficiency extraction and optimization of edge characteristics are realized by the synergistic effect of the links, and the method is specifically as follows: The first link is efficient edge feature extraction, creatively designs by adopting a depth separable convolution architecture, thoroughly abandons redundant calculation of traditional standard convolution in the feature extraction process, firstly processes an input feature image through a 3X 3 depth convolution layer, wherein the convolution kernels of the 3X 3 depth convolution layer are in one-to-one correspondence with the channel number of the feature image, namely each convolution kernel only carries out space dimension convolution operation in a single channel, and can accurately focus space position information and texture features of the coaxial line joint edge; The second link is dynamic self-adaptive channel adjustment, and a double-delay initialization mechanism is innovatively introduced, namely, a 1 multiplied by 1 point convolution layer and a 3 multiplied by 3 depth convolution layer in a module do not predefine specific input/output channel numbers and convolution kernel parameters in the module initialization stage, but dynamically create an adaptive convolution layer structure and convolution kernel parameters according to channel dimensions and space dimensions of an actually received input feature map when the module carries out forward propagation for the first time, so that the fixed limit of the traditional edge enhancement module on the channel numbers of the input feature map is thoroughly broken, and the multi-scale model of n, s, m, l, x different parameter scales of a YOLO11 model can be flexibly adapted without any structural modification; The third link is feature integration and performance optimization, feature integrity and model operation efficiency in the edge enhancement process are guaranteed through a dual technical means, a residual error connection mechanism is introduced, edge features extracted through depth separable convolution and original input features of a module are directly added and fused, loss of basic feature information in the edge feature enhancement process can be effectively avoided, a feature map is guaranteed to contain accurate edge details and retain complete target integral features, a SiLU activation function is adopted to replace a traditional ReLU activation function, and a SiLU activation function has smooth nonlinear characteristics, so that gradient flow of the model can be optimized while feature nonlinear expression capacity is enhanced.
5. The improved YOLO11 model based coaxial line joint detection method of claim 4, wherein the LEE performs the following operations: For arbitrary dimension feature graphs input to feature fusion networks Spatial edge feature extraction is first performed by a dynamically initialized 3 x 3 depth convolution layer based on The method comprises the steps of dynamically generating 3X 3 convolution kernels with corresponding quantity, carrying out convolution operation in a single channel, accurately capturing the port contour of a coaxial line joint and key edge features of metal joints, simultaneously avoiding cross-channel redundancy calculation of traditional convolution, carrying out cross-channel fusion on edge feature graphs extracted by deep convolution through a dynamically initialized 1X 1 point convolution layer, integrating edge information of different channels, simultaneously preliminarily adjusting channel dimensions of feature graphs, and then starting a residual error connection mechanism to enable the edge features subjected to point convolution processing to be compared with module original input features Adding pixel by pixel to obtain a characteristic diagram after residual fusion Ensure that the basic features are not lost and then Inputting the characteristic images into SiLU activating function, further strengthening the difference between the edge characteristics of coaxial line joint and the background by the degree of distinguishing the nonlinear transformation enhancing characteristics of SiLU activating function, and finally carrying out channel number fine tuning on the activated characteristic images by another layer of dynamically initialized 1X 1 point convolution layer to ensure the finally output characteristic images The input dimension of the subsequent feature fusion layer of the feature fusion network is completely matched with that of the subsequent feature fusion layer of the feature fusion network, so that the smoothness of the whole feature fusion network in a feature processing flow is ensured; The mathematical formula corresponding to the whole operation process is as follows: ; ; ; ; ; In the formula, A3 x 3 depth convolution kernel dynamically initialized for a channel, Where is the row index of the convolution kernel, For the column index of the convolution kernel, An input channel index bound for the convolution kernel; Extracting spatial position abscissa for depth convolution Ordinate is The number of channels is Is a boundary feature map of (1); the spatial position of the input feature map; An output feature map for the first 1 x 1 point-by-point convolution; 、 Respectively dynamically initialized 1X 1 point convolution weight matrixes, wherein the former is used for fusing channel information, and the latter is used for matching the channel dimension of the subsequent layer; is an edge enhancement feature map after SiLU activation; Activating a function for Sigmoid; is a characteristic diagram after residual fusion, The feature map is enhanced for the final output edge.
6. The coaxial line joint detection method based on the improved YOLO11 model according to claim 1, wherein the layer-by-layer channel optimization strategy does not adopt a traditional uniform pruning or random pruning mode, but intelligently distributes pruning rate of each layer through hierarchical importance quantization calculation, and the flow of pruning range constraint realizes accurate reduction of model parameter and calculation amount on the premise of maximally preserving model detection precision, and the implementation steps of the layer-by-layer channel optimization strategy are as follows: Firstly, quantitatively calculating the importance degree of a hierarchy, namely, firstly, defining three optimization targets of precision, parameter quantity and time delay, covering the core evaluation dimension of model performance, and avoiding performance imbalance caused by single dimension optimization, wherein a numerical value mAP@0.5 on a data set is used for representing the average precision mean value when the cross ratio threshold value is 0.5 and is used as a core proxy index of model precision, mAP@0.5 can intuitively reflect the detection accurate recall capability of the model on a coaxial line joint and is a key precision evaluation standard recognized in a target detection task, the parameter matrix of each layer of the model is traversed, the parameter quantity of each layer is counted to comprise the learnable parameters of convolution kernel parameters and bias parameters and is used as a core index for measuring the calculation complexity of the layer, and an actual reasoning test is carried out on a NVIDIA Jetson AGX Xavier edge calculation platform and the independent reasoning time of each network layer is recorded through a high-precision timer and is used as a core index for measuring the real-time performance of the layer; After the ternary target data of each layer is obtained, respectively carrying out normalization processing on each type of index to eliminate the influence caused by the difference of the different index level, wherein the specific normalization mode is that for the precision index Denoted as the first The precision contribution value corresponding to the layer is obtained by testing a layer-by-layer shielding method, and is mapped to a [0,1] interval by adopting min-max normalization, and the reference quantity index is the first Parameter of layer Also adopts the min-max normalized mapping to the [0,1] interval, and for the time delay index, the first Inference time of layer The inverse min-max normalization mapping is adopted to the [0,1] interval, and the purpose of the inverse normalization is to enable a layer with shorter time delay to obtain higher normalization score, and keep consistent with the optimization direction of precision and parameter indexes; Then, for balancing the importance of the ternary target data, introducing weight coefficients alpha, beta and gamma, wherein alpha, beta and gamma are non-negative weight values preset according to actual requirements, satisfying alpha+beta+gamma=1, and performing linear weighted summation on the normalized ternary target data to obtain the level importance of each layer The calculation formula is that The design logic of the formula is that the higher the precision contribution is, The larger the value and the fewer the number of layers, The greater the value is, The larger the value is correspondingly, the shorter the inference time is, The greater the value is, The larger the value is correspondingly, the accurate quantification of the comprehensive importance of each layer is finally realized through weighted summation; The importance of the whole hierarchy The mathematical formula for the quantization calculation is as follows: ; In the formula, 、、 The maximum value of each index in all layers, For the value of the contribution to the accuracy, As a parameter quantity, the number of the parameters, Is the reasoning time delay; the second step is the intelligent distribution of the differential pruning rate, and the level importance of all layers is obtained After that, all layers are subjected to The method comprises the steps of carrying out global sorting from large to small, determining importance levels of all layers in the whole model, carrying out differential pruning strategies according to sorting results, wherein for shallow networks with low level importance, the networks are responsible for extracting basic texture features, redundant parameters are more, the influence on final detection precision is small, high pruning rate is allocated, namely the number of channels is greatly reduced, so as to maximally reduce the calculated amount of the model; The third step is pruning range constraint, in order to avoid serious reduction of feature extraction capability caused by too high pruning rate or insignificant light weight effect caused by too low pruning rate, a truncation function is introduced Limiting the range of pruning rate of each layer, and setting the minimum value of pruning rate And maximum value of The final calculation of pruning rate is calculated by the formula Implementation in which In order to achieve the final pruning rate, For the pruning intensity adjusting coefficient, is used for integrally controlling the excitation degree of pruning, Representing pruning base coefficients inversely proportional to the importance of the hierarchy, i.e. the lower the importance, the greater the pruning base coefficient, the higher the pruning rate, and then passing The function limits the calculated pruning rate to [ , In the section, the pruning operation is ensured to effectively reduce the complexity of the model and not to cause obvious influence on the detection precision.

Description

Coaxial line joint detection method based on improved YOLO11 model Technical Field The invention relates to the technical field of machine vision detection, in particular to a coaxial line joint detection method based on an improved YOLO11 model, which is suitable for industrial scenes such as 3C product automatic assembly lines and the like and can realize high-precision real-time detection and positioning of small-size, low-texture and strong-reflection coaxial line joints. Background Along with the rapid development of intelligent manufacturing and 3C industry, 3C products show the remarkable characteristics of miniaturization, high density and complicated structure, and the connection precision of coaxial connectors in the assembly process directly determines the overall performance and stability of the products. The coaxial line connector is used as a core component of an electronic assembly line, has the characteristics of small size, missing surface texture, easy reflection and the like, and causes detection and positioning of the coaxial line connector to face a plurality of technical bottlenecks: 1. The coaxial line joint belongs to the industrial small target class, almost no relevant sample exists in the public data set, and multiple equipment and multiple working conditions are required to be covered by the coaxial line joint for self-collection, so that the operation difficulty is high; 2. The illumination condition in the industrial scene fluctuates, large assembly line light switching exists, external natural light interference exists, reflection is easily generated on the metal material on the surface of the coaxial line joint, the target is confused with the background gray value, and the traditional detection model is easy to generate 'background misjudgment' or 'target omission'; 3. The coaxial line joint is undersized, the target pixel accounts for less than 0.5 percent under the resolution of a conventional image, such as 1280 multiplied by 720, the characteristic extraction capability of the existing model on a small target is weak, and key details such as the edge of a joint port are easily submerged by noise; 4. The real-time detection requirement of the assembly line has strict requirements on the model reasoning speed, meanwhile, the calculation force of industrial edge equipment is limited, the existing complex model has large parameter quantity, time consuming reasoning and high adaptation difficulty, and the lightweight model is difficult to balance the detection precision. YOLO11 is a mainstream real-time target detection frame, and although the detection speed and accuracy are remarkably improved compared with the former generation, when the method is directly applied to coaxial line joint detection, the problems of insufficient focusing of small target features, blurred edge features, difficult balance of accuracy and light weight and the like still exist. Disclosure of Invention The invention aims to overcome the defects and shortcomings of the prior art, and provides a coaxial line joint detection method based on an improved YOLO11 model, which can effectively solve the problems of scarcity of a data set, difficult recognition of a small target, fuzzy edge characteristics, limited model deployment and the like in coaxial line joint detection, and meets the real-time requirement of an industrial assembly line on the premise of ensuring the detection precision. In order to achieve the aim, the technical scheme provided by the invention is that the coaxial line joint detection method based on an improved YOLO11 model is characterized in that the improved YOLO11 model is an improvement on a backbone network, a feature fusion network and a model structure of an original YOLO11 model so as to achieve accurate detection and positioning of coaxial line joints, wherein the improvement on the backbone network is that a down-sampling module C2PSA of the backbone network is replaced by a high-efficiency channel attention module with dynamic channel self-adaption and small target feature enhancement capability, which is called EA for enhancing small target feature focusing capability, the improvement on the feature fusion network is that a lightweight edge enhancement module which is called LEE for enhancing edge feature response and is called as an improvement on the model structure is that a layer-by-layer channel optimization strategy which is distinguished by importance of each layer is adopted, the number of channels of each layer is respectively calculated, and the light weight of the model is achieved on the premise of ensuring accuracy; the specific implementation of the coaxial line joint detection method comprises the following steps: 1) Acquiring coaxial line joint images under multiple equipment and multiple scenes, and preprocessing to obtain a coaxial line joint target detection data set with sufficient sample diversity; 2) In the training process, extracting multi-scale c