CN-122023882-A - Water surface floater multitasking identification method based on WFD-Net
Abstract
The invention relates to a water surface floater multi-task identification method based on WFD-Net, belongs to the field of computer vision, and aims to solve the technical pain points of low detection precision, small target omission, single function and poor multi-task cooperativity of floaters in complex water scenes in the prior art. The method comprises the steps of constructing a multi-category and multi-scene water surface floater marking data set, carrying out targeted enhancement, designing a DeepLabV3+ improved WFD-Net model, adapting to floater characteristics of different scales through a dynamic void ratio selection mechanism, weakening water background interference by a water surface characteristic enhancement module, enhancing target characteristic expression by a characteristic recalibration module, constructing a multi-task learning network to realize cooperative training of floater segmentation and classification, and introducing a category attention module to improve classification accuracy. The invention realizes the accurate identification of the floaters in complex scenes in theory through the modularized design and the multi-task cooperative architecture, can be applied to the monitoring and the treatment of water areas, and has remarkable practical value and popularization prospect.
Inventors
- ZHANG YUNFEI
- GUO XUN
- CUI HAN
- ZHANG PENGCHENG
Assignees
- 河海大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260105
Claims (6)
- 1. The WFD-Net-based water surface floater multi-task identification method is characterized by comprising the following steps of: and S1, constructing and enhancing the multi-scene floater data set. Inputting river channel aerial image covering algae, duckweed and other natural floaters and 7 artificial garbage such as plastics, foam, rope net, glass, metal and the like, wherein the algae image is 4000 pieces and the other type image is 1200 pieces, carrying out full pixel marking by adopting LabelMe tools, making a differential marking rule, namely, marking a complete outline by bottle-shaped plastics, marking edge areas by film-shaped plastics and marking winding nodes by rope net objects, expanding a data set by basic enhancement and scene enhancement, wherein the basic enhancement comprises 0-360 DEG random rotation, horizontal/vertical overturning and 0.8-1.2 times scaling, the scene enhancement comprises adding sinusoidal textures to simulate water surface waves, overlapping strong light spots and backlight shadow to simulate illumination interference, reducing contrast in local areas to simulate water mist shielding, and finally dividing the enhanced data set into a training set, a verification set and a test set according to a 7:2:1 ratio to form a sample set for training. And S2, constructing and training a WFD-Net model. A WFD-Net model is built by using DeepLabV & lt+ & gt as a base model and comprises three major core improvement modules, namely (1) a dynamic void ratio selection mechanism is used for carrying out binarization processing on an image of a water area to be input, counting the ratio of the foreground pixel number of floats to the total pixel number of the image to obtain a pixel duty ratio, when the duty ratio is less than 5%, setting the void ratio to be 0.5-1 so as to reserve small-scale target detail characteristics, when the duty ratio is more than or equal to 5%, setting the void ratio to be 2-4 so as to enlarge large-scale target receptive fields, and completing feature extraction of floats of different scales based on the judged void ratio, (2) a water surface feature enhancement module is used for adopting a linear gray conversion formula to promote gray differences between floats and water surface, removing high-frequency noise caused by water surface ripples through a 5X 5 Gaussian filter, setting a double threshold (low threshold 50, high threshold 150) and accurately extracting edge features of the floats through a Canny edge detection algorithm, and (3) a feature calibration module is used for carrying out global pooling calculation on feature maps in channel dimensions, carrying out weight normalization on a channel function, lifting key feature coefficients such as plastics material weight and metal weight coefficient after the channel dimension is more than or equal to 5%, and carrying out weight normalization on the feature map, and carrying out weight enhancement on the key feature coefficient is lifted to be 1.5 times the weight space and the weight is more than 2. Training a WFD-Net model by using the sample set in the step S1, adopting a combined loss function of cross entropy loss and Dice loss, and setting an initial learning rate Batch size Training was iterated 50 rounds. And S3, constructing a multi-task learning network and performing collaborative training. Adding classification branches on the basis of a WFD-Net model to construct a multi-task learning network: (1) The feature sharing mechanism is that a DeepLabV3+ encoder shared by a segmentation branch and a classification branch realizes bottom layer feature multiplexing, the classification branch converts a feature map into 2048-dimensional feature vectors through global average pooling on high-dimensional features output by the encoder, and then the classification probability of 7-class floaters is output through a full-connection layer; (2) Category attention allocation, namely allocating texture attention weights (coefficient 1.3) for natural floaters such as algae and duckweed, allocating contour attention weights (coefficient 1.2) for light garbage such as plastic and foam, allocating reflective attention weights (coefficient 1.5) for hard garbage such as metal and glass, and dynamically focusing key characteristics of floaters of different categories; (3) Multiplexing loss balancing by constructing a weighted loss function, wherein In order to divide the task loss, And (5) completing collaborative training of segmentation and classification tasks for classifying task losses. And S4, model application. Inputting the water area image to be identified into the trained model, outputting the float dividing mask and the category label, calculating the actual coverage area of the float according to the pixel number of the dividing mask and the geographic resolution of the image by an area calculation formula (wherein As the actual area of the substrate is, To divide the number of mask pixels, The image geographic resolution ratio) and provides a quantization basis for water environment treatment.
- 2. The method according to claim 1, wherein the scene enhancement of the multi-scene float data set comprises the steps of performing basic enhancement processing, sequentially performing rotation (0-360 degrees of random angle), overturning (horizontal/vertical) and scaling (0.8-1.2 times) on an original image with the marked image, expanding the number of basic samples, simulating water surface waves, restoring a water surface fluctuation scene under different flow rates by adding sinusoidal textures to the image, simulating illumination interference, overlapping strong light spots and backlight shadows to the image, reproducing a complex illumination monitoring environment, simulating water mist shielding, reducing the contrast of local areas of the image, simulating a monitoring scene under the weather of haze and rainfall, and dividing the enhanced data set into a training set, a verification set and a test set according to the ratio of 7:2:1, wherein the step B1 comprises the steps of performing basic enhancement processing, performing rotation (0-360 degrees of random angle), overturning (horizontal/vertical) and scaling (0.8-1.2 times) on the marked original image, and dividing the enhanced data set into the training set, the data set and the data set.
- 3. The method of claim 1, wherein the dynamic void ratio selection mechanism of the WFD-Net model comprises the steps of performing image preprocessing, binarizing a water area image to be input into the model, distinguishing a floating object foreground area from a water surface background area, calculating a pixel ratio, counting the ratio of the number of floating object foreground pixels in the binarized image to the total number of pixels in the image, obtaining the floating object pixel ratio, determining the void ratio, setting the void ratio to be 0.5-1 to preserve small-scale target detail features when the floating object pixel ratio is less than or equal to 5%, setting the void ratio to be 2-4 to enlarge a large-scale target feeling field when the floating object pixel ratio is more than or equal to 5%, and performing feature extraction on the image based on the determined void ratio, and completing feature capture of floating objects with different scales.
- 4. The method according to claim 1, wherein the water surface feature enhancement module of the WFD-Net model comprises the step D1 of enhancing contrast by gray scale transformation, and the method adopts a linear gray scale transformation formula, wherein , The method comprises the steps of (1) lifting gray level difference between a floater and a water surface, carrying out Gaussian filtering denoising, filtering an image subjected to gray level transformation by using a 5×5 Gaussian filter, removing high-frequency noise caused by water surface ripple, and carrying out edge detection strengthening boundary, setting a double threshold (a low threshold 50 and a high threshold 150), and accurately extracting the edge characteristics of the floater by a Canny edge detection algorithm.
- 5. The method of claim 1, wherein the step of configuring the weighted loss function and category attention of the multi-task learning network comprises the step E1 of constructing the loss function, constructing the weighted loss function, wherein In order to divide the task loss, The method comprises the steps of classifying task losses, setting class attention to allocate texture attention weight for natural floaters such as algae and duckweed, setting a coefficient to be 1.3, allocating contour attention weight for light garbage such as plastic and foam, setting a coefficient to be 1.2, allocating reflecting attention weight for hard garbage such as metal and glass, setting a coefficient to be 1.5, and performing collaborative training, wherein the collaborative training is performed on a multi-task learning network based on constructed loss functions and attention weights to complete collaborative optimization of segmentation and classification tasks.
- 6. The method of claim 1, wherein the method uses an optimization strategy of Adam as an optimizer and a learning rate as a training algorithm Parameters (parameters) , Weight decay coefficient Batch size And adopting an early-stopping strategy, namely stopping training and preventing overfitting if the verification loss is not reduced for 10 consecutive rounds.
Description
Water surface floater multitasking identification method based on WFD-Net Technical Field The invention belongs to the technical field of artificial intelligence and computer vision, and particularly relates to a water surface floater multitasking identification method based on WFD-Net (WaterFloatingDebrisNetwork), which is suitable for automatic floater monitoring scenes in water areas such as river channels, lakes, offshore and the like, can realize high-precision segmentation, accurate classification and coverage area calculation of floaters, and provides technical support for water environment treatment. Background Water surface float identification is one of the core tasks of water environment monitoring and pollution control, but faces the following technical bottlenecks: 1. The traditional Convolutional Neural Network (CNN) is limited by a fixed local receptive field, so that various and fine morphological characteristics (such as edge textures of broken foam, winding structures of rope nets and outlines of small-scale glass fragments) of floating objects are difficult to capture, and the small target omission rate is high. 2. The complex scene has poor adaptability, namely, complex interference such as ripple disturbance, strong light reflection, water mist shielding, light shadow change and the like exists in the water surface environment, the existing model is not specially optimized for the scene, the robustness to noise and background interference is insufficient, and false detection or feature confusion is easy to occur. 3. The multi-task cooperativity is weak, namely, the floater monitoring needs to simultaneously realize the double targets of 'segmentation positioning' and 'category identification', the existing model can only complete a single task (such as only segmentation and unclassification), or the multi-task module is unreasonable in design, so that the segmentation precision and the classification accuracy are mutually restricted, and the requirement of fine treatment is difficult to meet. Limitations of existing solutions: 1. The single model has the limitations that CNN models such as U-Net, deep LabV3+ and the like are excellent in local feature extraction, but have weak adaptability to global scenes, and are difficult to cope with the dispersion distribution of floaters in a large-scale water area, and models such as ViT, swinTransformer and the like have global perceptibility, but have high calculation complexity, and have insufficient detail sensitivity to small targets on the water surface, so that the model is not suitable for real-time monitoring scenes. 2. The hybrid architecture has the defects that the existing hybrid model (such as CNN+transducer) adopts serial feature transmission design, so that interaction between local detail features and global scene features is insufficient, and meanwhile, a special feature enhancement module is not designed for a special scene on the water surface, so that interference such as ripple, reflection and the like cannot be effectively weakened, the problem of loss balance of segmentation and classification tasks is not solved, and the overall recognition performance is limited. Therefore, there is a need for a water surface floater identification method that can achieve both precise extraction of small targets, anti-interference of complex scenes and efficient cooperation of multiple tasks. Disclosure of Invention 1. Object of the invention Aiming at the problems of poor scene adaptability, high small target omission ratio, single function and insufficient multi-task cooperativity in the existing water surface floater identification technology, the invention provides a water surface floater multi-task identification method based on WFD-Net, which realizes high-precision segmentation, accurate classification and area statistics of floaters in complex water area scenes and provides an efficient technical scheme for water environment treatment. 2. Technical proposal The WFD-Net model and the multi-task learning network proposed by the method of the invention comprise the following core modules and steps: step 1, constructing and enhancing a multi-scene floater data set: And collecting data, namely collecting images by disclosing a water environment monitoring data set, unmanned aerial vehicle field aerial photography and other modes, covering different illumination conditions such as sunny days, cloudy days, back lights and different river reach scenes such as shoal, deepwater areas and gate openings, and ensuring the diversity and representativeness of the data. Wherein the natural floaters are mainly algae and duckweed (4000 sheets), and the artificial garbage covers 7 types (1200 sheets) of plastics, foam, rope net, glass, metal and the like, and the total image is 5200 sheets. The method comprises the steps of accurately marking, namely making a standard for marking water surface floaters before marking, marking a complete outline for