CN-121982291-A - Explosive object target identification method based on multidimensional data optimization and feature fusion
Abstract
The application relates to the technical field of computer vision and artificial intelligence image processing, and discloses an explosive object target identification method based on multidimensional data optimization and feature fusion, which comprises the following steps of performing HSV space brightness adjustment, random clipping and gray level conversion on an original image to generate a single-channel gray level image; the method comprises the steps of extracting a multi-level feature map by using a convolutional neural network, generating a structural attention mask enhanced shallow feature map by using a fixed gradient operator, generating a multi-scale feature map with aligned distribution by re-standardization of each feature map, constructing a bidirectional cross-scale transfer path and carrying out weighted fusion, and carrying out category prediction and bounding box regression by using a decoupling prediction network to output an explosive object detection result. The application constructs random disturbance and single-channel graying preprocessing logic based on HSV brightness channels, can actively peel off extrinsic color disturbance, forces a model to focus on inherent geometric texture characteristics of an object, and thereby establishes an illumination invariance foundation at a data input end.
Inventors
- HAN ZHIBO
- LIU SHUAI
Assignees
- 北京北极星途技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260128
Claims (10)
- 1. The explosive object target identification method based on multidimensional data optimization and feature fusion is characterized by comprising the following steps of: acquiring an original image to be identified, performing HSV space brightness adjustment, random clipping and gray level conversion processing on the original image, and generating a single-channel gray level image; Performing multi-level feature extraction on the single-channel gray level image by using a convolutional neural network, and outputting a shallow feature map, a middle feature map and a deep feature map; extracting edge information of the shallow feature map by using a fixed gradient operator to generate a structural attention mask, and applying the structural attention mask to the shallow feature map to obtain an enhanced shallow feature map; respectively calculating the spatial dimension statistics of the enhanced shallow feature map, the middle layer feature map and the deep feature map, and performing re-standardization to generate a multi-scale feature map with aligned distribution; unifying the channel dimensions of the distributed and aligned multi-scale feature images, constructing a bidirectional trans-scale transfer path, and performing weighted fusion to generate a fused multi-scale feature image; And carrying out category prediction and bounding box regression based on the fused multi-scale feature map by using a decoupling prediction network, and outputting an explosive object detection result.
- 2. The method for identifying the target of the explosive object based on multi-dimensional data optimization and feature fusion according to claim 1, wherein the steps of performing HSV space brightness adjustment, random clipping and gray level conversion processing on the original image to generate a single-channel gray level image comprise: Converting the original image from an RGB color space to an HSV color space, leaving hue components and saturation components unchanged, and applying a linear transformation only to the brightness components; merging the regulated brightness component with the original tone component and the saturation component and inversely converting the brightness component into an RGB color space; Reading a labeling file of the original image, randomly generating a cutting window under the constraint condition that a preset coverage proportion threshold is met, and scaling a cutting area to a preset network input size; and calculating the weighted sum of the pixel values of each RGB channel by using a weighted average method, and converting the color image into the single-channel gray image.
- 3. The method for identifying the target of the explosive object based on multi-dimensional data optimization and feature fusion according to claim 1, wherein the step of performing multi-level feature extraction on the single-channel gray level image by using a convolutional neural network to output a shallow feature map, a middle feature map and a deep feature map comprises the steps of: Constructing a convolutional neural network comprising a depth separable convolutional and channel attention mechanism, and adapting input channel parameters of a first layer convolutional layer to a single channel; initializing first-layer convolution kernel weight by using a mean weight migration strategy, and keeping the response amplitude distribution of the single-channel input signal of convolution check approximately consistent with the pre-training state; And respectively outputting the shallow layer characteristic map with the downsampling multiplying power of 8, the middle layer characteristic map with the downsampling multiplying power of 16 and the deep layer characteristic map with the downsampling multiplying power of 32 through a layer-by-layer downsampling operation.
- 4. The method for identifying the target of the explosive object based on multi-dimensional data optimization and feature fusion according to claim 1, wherein the extracting the edge information of the shallow feature map by using a fixed gradient operator to generate a structural attention mask, and applying the structural attention mask to the enhanced shallow feature map comprises: Performing depth convolution operation on the shallow feature map by using a preset inaugurable horizontal convolution kernel and a preset vertical convolution kernel, and outputting a horizontal gradient feature map and a vertical gradient feature map; calculating gradient comprehensive amplitude values of the horizontal gradient feature map and the vertical gradient feature map at corresponding spatial positions; applying a learnable affine transformation parameter to the gradient integrated amplitude values for resetting, and mapping the parameter into the structural attention mask with the value between 0 and 1 through a nonlinear activation function; And performing element-by-element dot multiplication operation on the structural attention mask and the shallow feature map, and adding an operation result and the original shallow feature map to obtain the enhanced shallow feature map.
- 5. The method for identifying an object target of an explosive based on multidimensional data optimization and feature fusion according to claim 1, wherein the calculating and re-normalizing spatial dimension statistics of the enhanced shallow feature map, the middle feature map and the deep feature map respectively, and generating a multi-scale feature map with aligned distribution comprises: Independently constructing an instance normalization layer aiming at the feature map of each hierarchy; respectively calculating a pixel mean value and a pixel variance of each channel in a space dimension aiming at each sample and each channel in the feature map; Carrying out zero-mean unit variance standardization processing on the characteristic pixel value of the channel by utilizing the pixel mean and the pixel variance; affine transformation is carried out on the normalized characteristics by utilizing a leachable scaling factor and a translation factor, and the multi-scale characteristic map with aligned distribution is output.
- 6. The method for identifying an object target of an explosive based on multi-dimensional data optimization and feature fusion according to claim 1, wherein unifying channel dimensions of the distributed and aligned multi-dimensional feature map, constructing a bi-directional cross-scale transfer path and performing weighted fusion, and generating a fused multi-dimensional feature map comprises: Performing point-by-point convolution on each level of feature images input by using a 1X 1 convolution check, and adjusting the channel number of all feature images to be in a preset dimension; Removing intermediate nodes with only a single input edge in the fused topology; Constructing a semantic enhancement path from top to bottom, and transmitting the high-level features to the low-level features after upsampling; constructing a bottom-up positioning enhancement path, and transmitting the lower-layer characteristics to the higher-layer characteristics after downsampling; Establishing direct jump connection between an input node and an output node of the same resolution level; For each hierarchy of fusion nodes, input features from different paths are normalized weighted summed using a learnable weight parameter.
- 7. The method for identifying the target of the explosive object based on multi-dimensional data optimization and feature fusion according to claim 1, wherein the step of performing category prediction and bounding box regression based on the fused multi-scale feature map by using a decoupling prediction network, and outputting the explosive object detection result comprises: Constructing a classification sub-network and a box regression sub-network of parameter sharing, wherein the classification sub-network and the box regression sub-network are composed of stacked depth separable convolution layers; outputting logic values of each preset anchor frame belonging to a specific category through the classifying sub-network, and mapping the logic values into category confidence degrees by utilizing an activating function; outputting the position offset and the size scaling coefficient of the prediction frame relative to a preset anchor frame through the box regression sub-network; according to the downsampling multiplying power of the feature map and the geometric parameters of a preset anchor frame, calculating the absolute center coordinates and the width-height sizes of the prediction frame in the original image by utilizing the position offset and the size scaling factor; And performing non-maximum suppression processing on all the prediction frames based on the cross comparison threshold value, and eliminating redundant overlapped detection frames.
- 8. The method for object identification of explosive based on multidimensional data optimization and feature fusion according to claim 2, wherein the applying linear transformation only to the brightness component comprises: Generating random brightness adjustment coefficients obeying uniform distribution; Performing linear gain calculation on the original brightness component by using the random brightness adjustment coefficient, and limiting a calculation result between a minimum pixel value and a maximum pixel value allowed by an image format by using a numerical value cut-off function; the randomly generating the clipping window under the constraint condition that the preset coverage proportion threshold is met comprises the following steps: calculating the intersection area of the randomly generated clipping window area and the target real boundary box area; and judging whether the ratio of the intersection area to the target real boundary box area is larger than or equal to a preset coverage ratio threshold value, and if so, reserving the cutting window.
- 9. The method for identifying the target of the explosive object based on multi-dimensional data optimization and feature fusion according to claim 4, wherein the performing the deep convolution operation on the shallow feature map by using a preset inaugurable horizontal convolution kernel and a vertical convolution kernel respectively comprises: Setting a weight matrix of the horizontal convolution kernel as a horizontal operator value of a Sobel operator, and setting a weight matrix of the vertical convolution kernel as a vertical operator value of the Sobel operator; maintaining the weights of the horizontal convolution kernel and the vertical convolution kernel not updated in the training process; said applying a learnable affine transformation parameter to said gradient integrated magnitude and mapping via a nonlinear activation function to said structured attention mask having a value between 0 and 1 comprises: Multiplying the gradient comprehensive amplitude by a leachable scaling factor and adding a leachable bias term; The weighted result is input to the Sigmoid function so that the gradient magnitude of the background region is mapped to a section approaching 0 and the gradient magnitude of the edge region is mapped to a section approaching 1.
- 10. The method of claim 6, wherein normalizing weighted summation of input features from different paths with a learnable weight parameter for each level of fusion nodes comprises: Applying a ReLU activation function to a learnable weight parameter corresponding to each input feature, and forcibly constraining the weight parameter to be a non-negative value; Calculating the sum of products of each input feature and the corresponding weight parameter; Calculating the sum of all the weight parameters, and adding a numerical stability constant into the sum; dividing the sum of the products by the sum after adding a numerical stability constant to obtain the fused characteristic output.
Description
Explosive object target identification method based on multidimensional data optimization and feature fusion Technical Field The invention relates to the technical field of computer vision and artificial intelligence image processing, in particular to an explosive object target identification method based on multidimensional data optimization and feature fusion. Background In mining and geological exploration operations, the flow direction control of civil explosive such as explosive, detonator and the like is directly related to production safety and social security. The operation site is often in a bad environment, and how to confirm the use state of the dangerous goods in real time, prevent the dangerous goods from being lost or being illegally brought out is a core pain point of site safety supervision. The traditional management mode is mostly dependent on manual on-site checking or visual inspection of the monitoring video, and the mode is low in efficiency and easy to cause supervision loopholes due to visual fatigue or negligence of personnel. Therefore, the automatic real-time detection and identification of explosive objects in the field video stream are realized by utilizing the computer vision technology, and the method becomes a key means for improving the intelligent level of the management and control of dangerous explosive substances and reducing the safety risk. The existing general target detection technology mainly relies on a deep convolutional neural network, and a model can automatically learn feature expression from an image through training of a large-scale data set. Algorithms such as R-CNN series or YOLO series show very high accuracy in the fields of urban security or traffic monitoring with sufficient illumination and clear scene structure. The algorithms can effectively extract color textures and high-level semantic information of the object through deep network stacking, have the capability of quick response and classification recognition when processing a conventional object with obvious appearance characteristics, and greatly promote the engineering application process of the image recognition technology. However, the above-mentioned general technique is directly transplanted into underground or field blasting scenes, and the limitations are not revealed. Firstly, the existing algorithm excessively depends on RGB color information, most of the underground area is illuminated by a mobile point light source, the illumination intensity is severely fluctuated and the color temperature deviation is large, the illumination noise can cause the color distribution of object imaging to drift, so that the original color characteristics of a model are invalid, secondly, the existing network tends to capture obvious texture block surfaces, but the target sizes of detonators, detonators and the like are tiny and often covered by dust, the contrast ratio with background rocks is extremely low, the network which lacks physical edge priori guidance is difficult to separate the tiny geometric outlines from turbid background, the omission is extremely easy to cause, and furthermore, the general algorithm adopts a simple linear superposition mode when processing multi-scale characteristics, and when simultaneously detecting a large-volume packing box and tiny detonators, shallow background noise is easy to submerge deep weak semantic information, so that the positioning precision of small-scale targets cannot meet the strict requirements of high-risk control. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an explosive object target identification method based on multidimensional data optimization and feature fusion, which aims to solve the problems that the prior art is poor in generalization robustness due to over-dependence on RGB color distribution in complex illumination scenes such as underground, and the geometric edge features of a tiny explosive object are difficult to extract and the positioning accuracy is insufficient under the conditions of low contrast background and large scale span. In order to achieve the purpose, the invention is realized by the following technical scheme that the explosive object target identification method based on multidimensional data optimization and feature fusion comprises the following steps: acquiring an original image to be identified, performing HSV space brightness adjustment, random clipping and gray level conversion processing on the original image, and generating a single-channel gray level image; Performing multi-level feature extraction on the single-channel gray level image by using a convolutional neural network, and outputting a shallow feature map, a middle feature map and a deep feature map; extracting edge information of the shallow feature map by using a fixed gradient operator to generate a structural attention mask, and applying the structural attention mask to the shallow feature map to obtain a