CN-121982507-A - Underwater target detection method based on RT-SonarNet

CN121982507ACN 121982507 ACN121982507 ACN 121982507ACN-121982507-A

Abstract

The application belongs to the technical field of underwater target detection, and particularly relates to an underwater target detection method based on RT-SonarNet. The method comprises the steps of collecting an underwater sonar dataset through a backbone network, preprocessing the underwater sonar dataset, recording a three-layer feature map output by the backbone network as a high-layer feature map, a middle-layer feature map and a low-layer feature map, inputting the high-layer feature map to an intra-scale feature interaction module, optimizing intra-scale interaction of image features, outputting an enhanced feature map, inputting the enhanced feature map, the middle-layer feature map and the low-layer feature map to a multi-scale feature self-adaptive fusion module for self-adaptive fusion, outputting a complete image feature sequence, introducing EIoU boundary box regression loss functions to serve as loss evaluation indexes of a model, and inputting the preprocessed sonar image to the model for underwater target detection. The application greatly reduces the calculation complexity, improves the detection precision by the inverse potential, and meets the requirement of underwater real-time detection by the processing speed.

Inventors

YANG XINGHAI
HE XIAOFAN
LI JINGWEN
WANG JINGJING
YANG XIAOHUI
JIANG KAIXIANG

Assignees

青岛科技大学

Dates

Publication Date: 20260505
Application Date: 20260202

Claims (8)

1. The underwater target detection method based on the RT-SonarNet is characterized in that the RT-SonarNet model comprises an intra-scale feature interaction module, a multi-scale feature self-adaptive fusion module and a detection module based on a cascade group attention mechanism, and the method comprises the following steps: S1, a backbone network collects an underwater sonar data set and preprocesses the underwater sonar data set, and a three-layer characteristic diagram output by the backbone network is recorded as a high-layer characteristic diagram, a middle-layer characteristic diagram and a low-layer characteristic diagram; S2, inputting a high-level feature map with the most abundant semantic information output by a backbone network to an intra-scale feature interaction module, optimizing intra-scale interaction of image features through a cascade group attention mechanism, and outputting an enhanced feature map; S3, inputting the enhanced feature map, the middle-layer feature map and the low-layer feature map into a multi-scale feature self-adaptive fusion module for self-adaptive fusion, and finally outputting three feature maps for deep fusion to output a complete image feature sequence; S4, introducing EIoU boundary box regression loss functions as loss evaluation indexes of the model by the detection module so as to improve the detection effect of the RT-SonarNet model on multi-scale and small targets; S5, inputting the preprocessed sonar image into a training-completed RT-SonarNet model to realize underwater target detection.
2. The method for detecting the underwater target based on the RT-SonarNet of claim 1, wherein the cascade group attention mechanism comprises the steps of firstly dividing input image features into k feature subgroups in a channel dimension uniformly, then implementing an attention mechanism inside each feature subgroup to capture fine-granularity detail relations, and integrating output features processed by the former subgroup into inputs of the latter subgroup except the first subgroup so as to realize layer-by-layer transmission and fusion of feature information.
3. The RT-SonarNet based underwater target detection method of claim 2, wherein the mathematical expression of the output features of each feature subset is as follows: ; In the formula, Is the first In the block of The individual groups calculate the resulting self-attention output values; Is the first Input of the first block The individual groups are assigned to features; is the total number of cascade groups, satisfies ; 、 And Respectively the first A query projection matrix, a key projection matrix and a value projection matrix corresponding to the groups; the spliced features are projected back to linear layer parameters with consistent input feature dimensions.
4. The method for detecting an underwater target according to claim 2, wherein the intra-scale feature interaction module CIFI takes image features as input ends, and each encoder layer sequentially comprises a cascade group attention module, a first residual connection and layer normalization, a feed-forward network and a second residual connection and layer normalization.
5. The method for detecting the underwater target based on the RT-SonarNet is characterized in that the multi-scale feature self-adaptive fusion module comprises an FAF sub-module, and the processing flow of the FAF sub-module comprises the steps of carrying out bilinear interpolation up-sampling on a low-resolution feature map, recovering the original size dimension of the low-resolution feature map and keeping the original size dimension consistent with a high-resolution feature map, compressing the feature, converting the two-dimensional feature map into a single pixel value with the same channel number by using global average pooling, and finally increasing nonlinear fitting capacity among channels by two one-dimensional convolution and Sigmoid activation functions.
6. The RT-SonarNet based underwater target detection method of claim 1, wherein EIoU bounding box regression loss function is calculated as follows: ; In the formula, The loss function is regressed for a bounding box, In order to be the cross-ratio loss term, Represented as a center point distance loss term, For the loss term of the width and the height side length, To calculate the Euclidean distance between two center points or side lengths, units are pixels/pixels; is the center point coordinate of the prediction frame, unit is pixel; Is the center point coordinates of the real frame in units of pixels; Is the diagonal length of the minimum closed area covering the prediction and real frames in pixels/pixels; is the width of the prediction frame in units of pixels; is the height of the prediction frame, unit pixel; is the width of the real frame in pixels; is the height of the real frame in units of pixels; Is the width of the minimum occlusion region covering the prediction and real frames in pixels/pixels; Is the height of the minimum occlusion region covering the prediction and real frames in pixels.
7. The RT-SonarNet based underwater target detection method of claim 1, wherein the multi-scale feature adaptive fusion module is an MFM module which uses one-dimensional convolution instead of full connection layer.
8. The RT-SonarNet based underwater target detection method according to claim 1, wherein said preprocessing is histogram equalization and pseudo-color processing of sonar images in the original dataset.

Description

Underwater target detection method based on RT-SonarNet Technical Field The application belongs to the technical field of underwater target detection, and particularly relates to an underwater target detection method based on RT-SonarNet. Background Underwater target detection plays a vital role in various fields such as deep sea exploration, offshore oil exploration, ocean development and the like. Because the underwater environment is complex and is influenced by factors such as natural light, the light waves decay fast under water, and the traditional optical imaging system is difficult to obtain high-quality underwater images. Compared with light waves, the attenuation of sound waves in the ocean is minimum, and the sonar imaging system can obtain high-quality underwater images by utilizing the characteristic of the sound waves, so that the sonar imaging system is very suitable for underwater target detection. In addition, the sonar image can capture and directly reflect comprehensive underwater information, and is an indispensable tool for underwater target detection tasks. In recent years, efforts have been made to use sonar images for target detection. These methods can be classified into conventional detection methods and deep learning-based methods. Conventional detection methods include image processing-based techniques such as filtering, edge detection, feature extraction, template matching, and the like. These methods typically rely on the construction of artificial design features and rules, which are suitable for a particular scenario, but are less robust in complex environments. The method based on deep learning automatically extracts representative features from the image through training, and then directly locates and classifies targets based on the features, so that the method has stronger generalization capability compared with the traditional method. Despite the remarkable progress of deep learning in object detection in natural images, many classical object detection algorithms, such as RCNN, YOLO and transducer-based methods, have been widely used in various visual tasks due to their excellent feature extraction capabilities. However, when these methods are applied directly to sonar images, the effect is far from ideal as optical images. The sonar image has complex background, blurred target boundary and more noise, so that the existing deep learning model developed based on the optical image is difficult to cope with the problems, and particularly has poor performance in detecting small targets and distinguishing targets from noise under the complex background. The RT-DETR model is used as an advanced target detection framework based on a transducer mechanism, and the multi-head self-attention mechanism of the RT-DETR model is excellent in long-distance dependency and multi-scale feature processing in captured images. This advantage enables RT-DETR to achieve good detection performance in a variety of complex natural image target detection tasks. However, in the sonar image target detection task, RT-DETR still faces challenges such as high computational complexity and difficulty in effectively distinguishing targets from backgrounds. On the one hand, multi-head self-attention is high in calculation cost when processing intra-scale feature interaction, and particularly in the case of real-time detection, time consumption is increased. On the other hand, small targets in the sonar image have certain limitations on the positioning of the bounding box due to unobvious characteristics of the small targets. The existing detection method based on deep learning has the problems of high calculation complexity, difficulty in effectively distinguishing targets from backgrounds, complex sonar image background, fuzzy target boundary, low detection precision of small targets, and long time consumption in real-time detection. Disclosure of Invention In order to solve the problems in the prior art, the application provides an underwater target detection method based on RT-SonarNet. The technical scheme adopted for solving the technical problems is as follows, an underwater target detection method based on RT-SonarNet, an RT-SonarNet model comprises an intra-scale feature interaction module based on a cascade group attention mechanism, a multi-scale feature self-adaptive fusion module and a detection module, and the method comprises the following steps: S1, a backbone network collects an underwater sonar data set and preprocesses the underwater sonar data set, and a three-layer characteristic diagram output by the backbone network is recorded as a high-layer characteristic diagram, a middle-layer characteristic diagram and a low-layer characteristic diagram; S2, inputting a high-level feature map with the most abundant semantic information output by a backbone network to an intra-scale feature interaction module, optimizing intra-scale interaction of image features through a cascade group attention me