CN-121998855-A - Image enhancement method for unmanned underwater vehicle detection task

CN121998855ACN 121998855 ACN121998855 ACN 121998855ACN-121998855-A

Abstract

The invention provides an image enhancement method for unmanned underwater vehicle detection tasks, which belongs to the field of image processing and mainly improves the quality of underwater images; the invention adopts a 'teacher-student' model architecture based on knowledge distillation, utilizes a multi-factor degradation model to synthesize full-period degradation data in the process from image acquisition to transmission on line, and realizes the joint training and optimization of denoising and super-resolution tasks. Meanwhile, a task performance tracking module is designed, and a feedback optimization mechanism oriented to task consistency is established by comparing the difference of the effect of the student model output and the truth image on the detection task.

Inventors

CHEN YIMIN
TANG RUI
GAO JIAN
YANG SHUO
LI CHANG
Qiu Shanzhi

Assignees

西北工业大学

Dates

Publication Date: 20260508
Application Date: 20251219

Claims (8)

1. An image enhancement method for unmanned underwater vehicle detection tasks is characterized by comprising the following steps: s1, constructing a training set; Constructing a multi-factor degradation module, using the multi-factor degradation module to convert the input truth image Conversion to a low resolution degraded image A low resolution degraded image will be obtained And true value image Pairing and storing in a training set; s2, constructing an underwater vehicle image enhancement network; The underwater vehicle image enhancement network comprises a student model, a teacher model, a task performance tracking module and a pixel-by-pixel loss module, wherein the input of the underwater vehicle image enhancement network is a low-resolution degradation image Low resolution degraded image Respectively inputting a teacher model and a student model, wherein the output of the teacher model is knowledge characteristics The output of the student model is a learning feature Learning features The task performance tracking module comprises a trained and parameter frozen target detection model and a task tracking loss module, and the task tracking loss module calculates and obtains the loss of the task performance tracking module The pixel-by-pixel loss module calculates a pixel-by-pixel loss ; S3, training an underwater vehicle image enhancement network by using data of a training set, and obtaining an optimal underwater vehicle image enhancement network when the total loss function of the underwater vehicle image enhancement network is stable; and S4, acquiring an image to be enhanced, inputting the image to be enhanced into an optimal underwater vehicle image enhancement network for processing, and acquiring learning characteristics of the image to be enhanced, wherein the learning characteristics are the finally acquired enhanced image.
2. The image enhancement method for unmanned underwater vehicle detection tasks according to claim 1, wherein the specific steps of the underwater vehicle image enhancement network are as follows: Learning features of student models Inputting the target detection model which is trained and has frozen parameters to obtain a learning feature detection result set ; ; Is the first The parameters of the target frame are detected in each of the plurality of detection units, Is the first Detecting category parameters of the target frames; ; Is the first The categories of the individual detection target frames; Is the first Confidence scores of the individual detection target frames; , , Is the first Detecting target frames; Is the first Coordinates of the center points of the detection target frames on the x axis; Is the first Coordinates of the center points of the detection target frames on the y axis; Is the first The width of each detection target frame; Is the first Detecting the heights of the target frames; ; image of true value Inputting the real-value feature detection result set into a trained target detection model with frozen parameters to obtain a true-value feature detection result set Wherein ; For the j-th detection of the target frame parameter, The category parameter of the jth detection target frame; ; The category of the j-th detection target frame; confidence score for the j-th detection target frame; , , the j-th detection target frame; The coordinate of the center point of the jth detection target frame on the x axis is detected; the coordinate of the center point of the jth detection target frame on the y axis is detected; The width of the jth detection target frame; detecting the height of the target frame for the j-th detection; ; Task performance tracking module loss is : ; The weight is lost for classification; Loss weight for full cross ratio; to normalize the wasperstein distance loss weights; in order to weight the classification loss, In order to weight the loss of positioning, To weight the positioning loss Updating parameters of the student model by a back propagation algorithm as part of the total loss function; weighted classification loss The method comprises the following steps: ; Wherein the method comprises the steps of Output detection target frame for representing student model Detecting a target frame corresponding to a truth image Probability of (2); , representing the temperature coefficient; Wherein, the And Super parameters for adjusting the weight; for the classification difference matrix, confidence is calculated for the classification difference matrix Confidence level of The classification difference matrix measures the difference between the prediction classification score and the target classification score, the elements of the classification difference matrix The method comprises the following steps: Is a geometric similarity matrix, which measures the overlapping degree of the prediction frame and the target frame, and which is used for the prediction frame Elements of (2) The method comprises the following steps: Wherein the area of intersection area The method comprises the following steps: ; ; Coordinates of the central point Conversion to the upper left corner And lower right corner Coordinates: Union area The method comprises the following steps: The weighted positioning loss The method comprises the following steps: ; The complete cross-over ratio (CIoU) increases the consistency constraint of the center point distance and the length-width ratio on the basis of the cross-over ratio (IoU), and the complete cross-over ratio loss matrix is used for measuring the consistency of the overlapping degree, the center point distance and the length-width ratio of the frames; full cross-ratio loss matrix Matrix elements of (a) The method comprises the following steps: Wherein, the The squared euclidean distance representing the center point of two boxes. A diagonal length square representing the smallest bounding rectangle of the two target detection frames; Is a balance parameter. Parameters for measuring the consistency of the length-width ratio; Weighted positioning loss The method comprises the following steps: ; For normalizing the Wasserstein distance loss matrix; Matrix elements of a normalized Wasserstein distance loss matrix The calculations are as follows: Wherein, the Is a constant value, and is a function of the constant, The method comprises the following steps: ; Is the distance between the two target detection frames.
3. An unmanned underwater vehicle detection mission-oriented image enhancement method as claimed in claim 1, wherein the pixel-by-pixel loss The method comprises the following steps: 。
4. The unmanned underwater vehicle detection task-oriented image enhancement method of claim 1, wherein the target detection model comprises a YOLO model and a Transformer-based model.
5. An image enhancement method for unmanned underwater vehicle detection tasks as claimed in claim 1, wherein the total loss function The method comprises the following steps: Wherein the method comprises the steps of Adjusting parameters for loss; module loss is tracked for task performance.
6. The method for enhancing an image for an unmanned underwater vehicle detection task according to claim 1, wherein the multi-factor degradation module comprises a fuzzy degradation module, an artifact module, a gaussian noise module, a salt and pepper noise module, a streak noise module, a speckle noise module, a JPEG compression module and random downsampling; The multi-factor degradation module is used for acquiring a true image Randomly selecting any module from the fuzzy degradation module, the artifact module, the Gaussian noise module, the salt and pepper noise module, the stripe noise module and the speckle noise module for processing, taking the processed result as the input of the JPEG compression module, inputting the output result of the JPEG compression module into random downsampling, and outputting the random downsampling module to finally obtain the degraded image 。 The specific processing steps of the fuzzy degradation module are as follows: step Deg-1 acquiring truth image Acquiring a blur kernel length Angle of blur ; Step Deg-2 of blurring an angle Converting from an angle system to an arc system; Step Deg-3, creating an all-zero matrix with L multiplied by L as a fuzzy core matrix; Step Deg-4, calculating initial coordinates of the motion track in the fuzzy core space by utilizing sine and cosine functions based on the central point coordinates of the fuzzy core 1 , 1 ) And termination coordinates [ ] 2 , 2 )。 Deg-5, calculating all discrete grid point coordinate sets for connecting the start coordinate and the end coordinate by adopting a Brisenham linear algorithm of cloth Lei Senhan mu; deg-6, setting element values corresponding to the coordinate positions of the discrete grid points in the fuzzy core matrix to be 1, and carrying out normalization processing on the fuzzy core matrix to ensure energy conservation; step Deg-7. Using the generated blur kernel matrix, the truth image is mapped to the image And carrying out two-dimensional convolution operation to obtain a convolved image, and carrying out convolution on each channel for the multi-channel image. Step Deg-8, cutting off the data in the convolved image to an effective pixel value range to obtain a motion blurred image; The specific processing steps of the artifact module are as follows: step Art-1. First, a truth image is created Is a copy of (a); step Art-2 for truth image Applying a translation transformation to the copy of (a) by an amount of ; Step Art-3, multiplying the image after translation transformation by attenuation coefficient Obtaining a translational attenuation image; step Art-4, translating the attenuation image with the true image Overlapping to obtain an artifact image; the Gaussian noise module comprises the following processing steps: Step Gau-1. Truth image is applied Carrying out normalization processing to obtain an image normalization matrix; The normalization process is to make the truth image Mapping the pixel values of (1) to floating point number intervals [0,1] to obtain an image normalization matrix; Step Gau-2, constructing a noise model according to a preset mean value Sum of variances Generating a truth image Noise matrixes with same size, wherein each element in the matrixes obeys normal distribution ; Gau-3, adding the noise matrix and the image normalization matrix element by element; Step Gau-4, truncating the result of element-by-element addition of the noise matrix and the image normalization matrix into the interval [0,1] to obtain a truncated matrix, and preventing pixel overflow; step Gau-5, remapping the truncated matrix back to the [0,255] interval, and converting the truncated matrix into an integer type to obtain a Gaussian noise image; the processing steps of the spiced salt noise module are as follows: Step Sal-1. According to the truth image Total number of pixels of (a) And a preset noise density Respectively calculating the salt noise points Number of noise points of pimiento Wherein = × /2, = × /2; Step Sal-2 in truth image Random generation within a coordinate range A group coordinate position; Step Sal-3. Truth image is taken At the position of The pixel value at the group coordinate position is forcedly set to be the maximum gray value, and a salt image is obtained; Step Sal-4 in truth image Random generation within a coordinate range And (3) a group coordinate position. Step Sal-5 salt image was taken on The pixel value at the group coordinate position is forcedly set to be a minimum gray value, and a final spiced salt noise image is obtained; the specific processing steps of the stripe noise module are as follows: step Strip-1, setting stripe width And stripe strength ; Step Strip-2, traversing along the vertical direction of the truth image, wherein the step length is set to be 2 Selecting a starting line in each step period To the point of + Is used as a stripe adding area; step Strip-3 adding all pixel values and intensity values of the region to the stripe Adding; Step Strip-4 adding each pixel value and intensity value of the region to the stripe The added result is subjected to saturation processing, so that the numerical value is ensured not to exceed the maximum pixel value; step Strip-5, repeating step Strip-2 to step Strip-4 until the whole image is covered, and obtaining a stripe image with periodic stripes; The processing steps of the speckle noise module are as follows: Step Speckle-1. Truth image is applied Carrying out normalization processing to obtain an image normalization matrix; The normalization process is to make the truth image Mapping the pixel values of (1) to floating point number intervals [0,1] to obtain an image normalization matrix; Step Speckle-2 of generating a random matrix of the same size as the truth image, the elements of the random matrix obeying Rayleigh distribution, each element of the random matrix being multiplied by a predetermined noise intensity coefficient Obtaining a random noise matrix; speckle-3, multiplying the image normalization matrix and the random noise matrix element by element to obtain a noise component; Step Speckle-4, superposing the noise component and the image normalization matrix to obtain a superposition matrix, i.e. executing a formula = + × , Step Speckle-5, performing truncation processing on each element in the superposition matrix, and quantizing and restoring the elements into an integer format to obtain a speckle noise image, wherein the truncation processing truncates each element in the superposition matrix to a range of [0,1 ]; the specific algorithm flow of the JPEG compression module is as follows: step JP-1, converting an input image into a three-channel image; Step JP-2, configuring JPEG coding parameters and setting compression quality factor Encoding the input image into a binary stream using a JPEG encoding algorithm; JPEG coding algorithms include discrete cosine transform DCT, quantization, and entropy coding; Decoding the binary stream by using a JPEG decoding algorithm, and reconstructing the binary stream into a JPEG image matrix; The random downsampling input is a JPEG image matrix, and the specific algorithm flow of the random downsampling is as follows: step Downsam-1, calculating the size of the low-resolution degraded image according to the set scaling factor; And Downsam-2, reducing the JPEG image matrix by using bicubic interpolation, and outputting a low-resolution degraded image.
7. Terminal equipment comprising a processor, a memory and a computer program stored in the memory, characterized in that the processor, when executing the computer program, implements an image enhancement method for unmanned underwater vehicle detection tasks according to any of claims 1-6.
8. A computer-readable storage medium, in which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements an image enhancement method for unmanned underwater vehicle-oriented detection tasks according to any of claims 1-6.

Description

Image enhancement method for unmanned underwater vehicle detection task Technical Field The invention belongs to the field of underwater detection and side-scan sonar image processing, and mainly relates to an image enhancement method for unmanned underwater vehicle detection tasks. Background The sonar is used as a key device for underwater detection and tracking, and plays a vital role in practical applications such as ocean resource development and maintenance, underwater search and rescue, scientific research ecological protection, fishery ocean economy, military security protection and the like. Currently, acoustic observation techniques based on-board equipment or underwater unmanned platforms have become a research hotspot. Side-scan sonar (Side-scan sonar, SSS) has become one of the core devices in the fields of underwater detection and the like by virtue of its flexible carrying and higher resolution imaging capability. However, the side-scan sonar images generally face the dual challenges of multiple unknown parameter noise interference and limited resolution, subject to acoustic imaging mechanisms and complex and varying environments under water. This severely constrains its performance in practical tasks. The traditional sonar image denoising method is mainly divided into spatial-domain approaches, wavelet-domain approaches, and nonlocal approaches [SEGSID: A Semantic-Guided Framework for Sonar Image Despeckling]. along with the rapid development of a deep learning technology, and a convolutional neural network (Convolutional neural network, CNN) shows remarkable advantages in a sonar denoising task. For example, lu et al [ LEARNING A DEEP convolutional network for speckle noise reduction in underwater sonar images ] propose a sonar image speckle noise estimation and denoising method based on a convolutional neural network by designing a structural similarity metric loss function and training a model on the synthesized speckle noise dataset, thereby achieving speckle noise reduction while preserving important geometry. Vishwakarma et al [ Denoising AND INPAINTING of sonar images using convolutional sparse representation ] propose a convolution sparse representation-based method to implement a sonar image denoising task. Zhao et al [ Unpaired sonar image denoising with simultaneous contrastive learning ] employ a cyclic structure of multi-domain transformations to generate valuable surface information and mine potential feature distribution between different domains by adding bi-directional mappings to facilitate image denoising and image restoration. Cheng et al [ Sonar image garbage detection via global DESPECKLING AND DYNAMIC attention graph optimization ] propose a sonar image denoising structure based on a self-supervision blind spot network, and perform global perception noise suppression aiming at the noise characteristics of a sonar image, so that a better denoising effect is achieved. With the widespread use of the transducer technology in the field of computer vision, researchers have also gradually introduced it into the task of image denoising. For example, perera et al [ transducer-based SAR IMAGE DESPECKLING ] propose a transducer-based speckle-removing network that achieves effective denoising. For the task of sonar image super resolution, the traditional method mainly comprises an interpolation method, a reconstruction-based method, a multi-frame fusion method and the like. These methods have problems such as loss of detail, need for accurate registration, etc. to some extent. Currently, depth learning-based sonar image super resolution is in the spotlight, and numerous achievements have been produced. For example, hua et al [ A Super-Resolution reconstruction method of underwater target detection image by side scan sonar ] replace the normal convolutional layer with an extended convolutional layer, optimize SRGAN (Super-Resolution GAN), and apply it to sonar image Super-Resolution reconstruction. Inspired by self-calibrating convolution, ma et al [ MHGAN:A multi-HEADED GENERATIVE ADVERSARIAL network for underwater sonar image super-resolution ] constructed a multi-headed GAN to achieve sonar image super resolution. Wen et al [A framework for super-resolution of side-scan sonar images: Combination of variational Bayes and regional feature selection] combine the variable decibel leaf with region-based feature selection to achieve a supersound image super-resolution. However, these methods may amplify existing noise and introduce artifacts. In the model training and deployment level, the existing image enhancement technology based on deep learning has the defects that the degradation model used in the synthesis process of the dependent clear-degradation image during training is too simplified, the full-period complex degradation process from imaging acquisition to data transmission of a real underwater environment cannot be fully simulated, so that the generated training sample has