CN-121998912-A - Target detection method for detecting metal surface defects

CN121998912ACN 121998912 ACN121998912 ACN 121998912ACN-121998912-A

Abstract

The invention relates to a target detection method for detecting metal surface defects, belongs to the technical field of metal surface defect detection, and solves the problems of insufficient detection precision and efficiency of a traditional metal surface defect detection method in the prior art. A target detection method for detecting metal surface defects comprises the following steps of obtaining metal surface image data to be detected, inputting the metal surface image data into a metal surface defect detection model, wherein the metal surface defect detection model is obtained based on YOLOv model improvement, replacing a backbone network of a YOLOv model with a StarNet network, replacing a feature pyramid network of a YOLOv model with a bidirectional feature pyramid network, and outputting the type, the position and the confidence of the metal surface defects through the metal surface defect detection model. The invention improves the accuracy and efficiency of metal surface defect monitoring.

Inventors

ZHANG YIXUAN
ZHANG ZHONGWEN
SHI WEINA
CUI JINGXUN
LIU CHANGKUN
XU BAODE
WEI YUAN
DING GUOZHI
FAN XIGANG
SONG XIAOYU
Wen Yuwang
WEN LU
LIU ZHIJIAN

Assignees

北京星航机电装备有限公司

Dates

Publication Date: 20260508
Application Date: 20251230

Claims (10)

1. A target detection method for inspecting a metal surface for defects, comprising the steps of: Acquiring image data of a metal surface to be detected; Inputting the metal surface image data into a metal surface defect detection model, wherein the metal surface defect detection model is improved based on a YOLOv model, and the improvement comprises the steps of replacing a backbone network of a YOLOv model with a StarNet network and replacing a feature pyramid network of a YOLOv model with a bidirectional feature pyramid network; Outputting the category, the position and the confidence of the metal surface defect through the metal surface defect detection model.
2. The method of claim 1, wherein the metal surface defect detection model comprises a StarNet network, a bi-directional feature pyramid network, and a detection head connected in sequence; the StarNet network receives the metal surface image data and performs multi-scale feature extraction to obtain a multi-scale feature map, and the multi-scale feature map is sent to the bidirectional feature pyramid network; The bidirectional feature pyramid network performs feature fusion and enhancement on the multi-scale feature map to obtain an enhanced multi-scale feature map, performs feature weighting on the enhanced multi-scale feature map to obtain weighted multi-scale features, and sends the weighted multi-scale features to the detection head; and the detection head carries out identification processing on the weighted multi-scale characteristics and outputs the category, the position and the confidence of the metal surface defects.
3. The method of claim 2, wherein the StarNet network comprises a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, and a fourth feature extraction unit in series in order; the first feature extraction unit receives the metal surface image data and outputs a first scale feature map to the second feature extraction unit; the second feature extraction unit receives the first scale feature map and outputs a second scale feature map to the third feature extraction unit and the bidirectional feature pyramid network; The third feature extraction unit receives the second scale feature map and outputs a third scale feature map to the fourth feature extraction unit and the bidirectional feature pyramid network; And the fourth feature extraction unit receives the third scale feature map and outputs the fourth scale feature map to the bidirectional feature pyramid network.
4. A method according to claim 3, wherein the first feature extraction unit comprises a StarNet module, the StarNet module of the first feature extraction unit receiving the metal surface image data, processing the metal surface image data by the StarNet module, outputting the first scale feature map; The second feature extraction unit comprises a downsampling layer and StarNet modules, the downsampling layer of the second feature extraction unit receives the first scale feature map and downsamples the first scale feature map, the downsampled feature map is input to the StarNet module of the second feature extraction unit, and the StarNet module of the second feature extraction unit processes the downsampled feature map and outputs the second scale feature map; the third feature extraction unit comprises a downsampling layer and StarNet modules, the downsampling layer of the third feature extraction unit receives the second scale feature map and downsamples the second scale feature map, the downsampled feature map is input to the StarNet module of the unit, and the StarNet module of the third feature extraction unit processes the downsampled feature map and outputs the third scale feature map; The fourth feature extraction unit comprises a downsampling layer and StarNet modules, the downsampling layer of the fourth feature extraction unit receives the third-scale feature map and downsamples the third-scale feature map, the downsampled feature map is input to the StarNet module of the unit, and the StarNet module of the fourth feature extraction unit processes the downsampled feature map and outputs the fourth-scale feature map.
5. The method of claim 4, wherein the StarNet modules each comprise a first depth-separable convolutional layer, a first fully-connected layer, a ReLU6 activation function layer, a second fully-connected layer, a bulk normalization layer, and a second depth-separable convolutional layer, connected in sequence; the first depth separable convolution layer receives an input feature map, performs feature extraction, and outputs a first feature map to the first full-connection layer; The first full-connection layer receives the first feature map and carries out nonlinear combination on channels of the input feature map through a polynomial kernel function to realize channel dimension transformation, and outputs a second feature map to the ReLU6 activation function layer; The ReLU6 activation function layer carries out nonlinear activation on the second feature map and outputs a third feature map to the second full-connection layer; The second full-connection layer receives the third feature map, performs channel dimension reduction, and outputs a fourth feature map to the batch normalization layer; the batch normalization layer normalizes the fourth feature map and outputs a fifth feature map to the second depth separable convolution layer; the second depth separable convolution layer receives the fifth feature map and performs depth feature extraction, and outputs a final output feature map of the StarNet module.
6. The method of claim 5, wherein the polynomial kernel function has a computational expression of: Wherein x j is the j-th channel of the feature map input to the first full-connection layer, w ij is a learnable weight parameter, b ij is a learnable bias parameter, p is a polynomial order, C in is the number of channels of the feature map input, and y i is the i-th channel of the feature map output from the first full-connection layer.
7. The method of claim 2, wherein the bi-directional feature pyramid network comprises an upsampling fusion module, a feature refinement module, a downsampling fusion module, a first multi-scale channel attention module, a second multi-scale channel attention module, and a third multi-scale channel attention module; The up-sampling fusion module receives the multi-scale feature map output by the StarNet network, fuses the high-level features and the low-level features through up-sampling operation, generates a first group of multi-scale feature maps, and outputs the first group of multi-scale feature maps to the feature refinement module; the feature refinement module carries out convolution enhancement on the first group of multi-scale feature images and outputs a second group of multi-scale feature images to the downsampling fusion module; The downsampling fusion module performs downsampling operation on the second group of multi-scale feature images, re-fuses the enhanced low-level features and the enhanced high-level features to generate a P3 feature image, a P4 feature image and a P5 feature image, outputs the P3 feature image to the first multi-scale channel attention module, outputs the P4 feature image to the second multi-scale channel attention module, and outputs the P5 feature image to the third multi-scale channel attention module; The first multi-scale channel attention module performs feature weighting on the P3 feature map, the second multi-scale channel attention module performs feature weighting on the P4 feature map, and the third multi-scale channel attention module performs feature weighting on the P5 feature map and sends the weighted P3, P4 and P5 feature maps to the detection head.
8. The method of claim 7, wherein the feature refinement module comprises a depth separable convolutional layer, a batch normalization layer, and Swish activation function layer in series; the depth separable convolution layer receives the first group of multi-scale feature images, performs feature extraction, and outputs the extracted features to a batch normalization layer; The batch normalization layer receiving depth can separate the extracted features output by the convolution layer and normalize the extracted features to obtain normalized features; And enabling the Swish activation function layer to perform nonlinear activation on the features normalized by the batch normalization layer, and outputting the second group of multi-scale feature graphs.
9. The method of claim 7, wherein the first, second, and third multi-scale channel attention modules each comprise a dual-path feature transformation unit and an adaptive fusion unit; The dual-path characteristic transformation unit is used for receiving an input characteristic diagram, generating a preliminary channel attention weight and outputting the preliminary channel attention weight to the self-adaptive fusion unit; the self-adaptive fusion unit is used for receiving the primary channel attention weights from different feature graphs and carrying out weighted fusion to generate final multi-scale fusion attention weights.
10. The method of claim 9, wherein the dual path signature transformation unit receiving the input signature and generating preliminary channel attention weights comprises: Carrying out global average pooling on the input feature map X to obtain a channel statistical vector; inputting the channel statistical vector into a first full-connection sub-path and a lightweight convolution sub-path respectively for feature transformation, wherein the first full-connection sub-path comprises a full-connection layer, and the lightweight convolution sub-path comprises a1 multiplied by 1 convolution layer; And fusing the output results of the first full-connection sub-path and the lightweight convolution sub-path, and generating a preliminary channel attention weight through a Sigmoid activation function.

Description

Target detection method for detecting metal surface defects Technical Field The invention relates to the technical field of metal surface defect detection, in particular to a target detection method for detecting metal surface defects. Background In the field of metal processing and manufacturing, metal surface defect detection is a key link for ensuring product quality. Traditional detection methods rely mainly on manual visual inspection or automated detection systems based on simple image processing techniques. The automatic detection system based on the traditional image processing technology has the advantages of low detection efficiency, strong subjectivity and easiness in missed detection or false detection caused by fatigue and other factors, but has limited capability of identifying complex defects, and is difficult to adapt to different illumination conditions and changes of metal surface materials. In recent years, with the development of deep learning technology, a convolutional neural network-based defect detection method is gradually developed, such as a YOLO series target detection model, which is excellent in real-time performance and accuracy, but when the method is directly applied to metal surface defect detection, the problems of insufficient accuracy of detecting micro defects, poor adaptability of the model to complex textures and the like still exist. In the process of realizing the embodiment of the invention, the prior art has at least the following problems or defects that the traditional detection method has low efficiency and poor precision, and is difficult to meet the high-precision and high-efficiency requirements of modern industrial production on metal surface defect detection, and the existing detection model based on deep learning has the problems of insufficient detection precision, limited model adaptability and the like when facing complex textures and micro defects of the metal surface, and cannot effectively cope with the diversified requirements of different illumination conditions and metal materials. Disclosure of Invention In view of the above analysis, the present invention is directed to a target detection method for inspecting defects on a metal surface, so as to solve the problem that the existing conventional method for inspecting defects on a metal surface is insufficient in detection accuracy and efficiency. The embodiment of the invention provides a target detection method for detecting metal surface defects, which comprises the following steps: Acquiring image data of a metal surface to be detected; Inputting the metal surface image data into a metal surface defect detection model, wherein the metal surface defect detection model is improved based on a YOLOv model, and the improvement comprises the steps of replacing a backbone network of a YOLOv model with a StarNet network and replacing a feature pyramid network of a YOLOv model with a bidirectional feature pyramid network; Outputting the category, the position and the confidence of the metal surface defect through the metal surface defect detection model. Further, the metal surface defect detection model comprises a StarNet network, a bidirectional feature pyramid network and a detection head which are sequentially connected; the StarNet network receives the metal surface image data and performs multi-scale feature extraction to obtain a multi-scale feature map, and the multi-scale feature map is sent to the bidirectional feature pyramid network; The bidirectional feature pyramid network performs feature fusion and enhancement on the multi-scale feature map to obtain an enhanced multi-scale feature map, performs feature weighting on the enhanced multi-scale feature map to obtain weighted multi-scale features, and sends the weighted multi-scale features to the detection head; and the detection head carries out identification processing on the weighted multi-scale characteristics and outputs the category, the position and the confidence of the metal surface defects. Further, the StarNet network comprises a first feature extraction unit, a second feature extraction unit, a third feature extraction unit and a fourth feature extraction unit which are sequentially connected in series; the first feature extraction unit receives the metal surface image data and outputs a first scale feature map to the second feature extraction unit; the second feature extraction unit receives the first scale feature map and outputs a second scale feature map to the third feature extraction unit and the bidirectional feature pyramid network; The third feature extraction unit receives the second scale feature map and outputs a third scale feature map to the fourth feature extraction unit and the bidirectional feature pyramid network; And the fourth feature extraction unit receives the third scale feature map and outputs the fourth scale feature map to the bidirectional feature pyramid network. Further, the first feature extracti