Search

CN-121482554-B - Drowning detection method and system combining polarization multi-modes

CN121482554BCN 121482554 BCN121482554 BCN 121482554BCN-121482554-B

Abstract

The invention discloses a drowning detection method and system combining polarization multi-modes, belongs to the technical field of drowning detection, and aims to solve the problem that the accuracy is low due to the fact that the existing method is affected by water surface reflection and complex water environment. The method comprises the steps of collecting polarization degree images and intensity images of swimmers, inputting the two images into a pre-trained polarization fusion network, obtaining fusion images through feature extraction, fusion and reconstruction, training a target detection network combining polarization fusion features by using a simulated drowning scene video/image shot by a polarization camera, and inputting the fusion images into the trained detection network to judge whether drowning occurs. The invention eliminates water surface reflection, improves the integrity of image information and the detection accuracy, is suitable for multiple scenes, and is suitable for intelligent drowning detection in swimming pools, seawater baths and other places.

Inventors

  • LI JIE
  • GONG MENGXIN

Assignees

  • 西安交通大学

Dates

Publication Date
20260508
Application Date
20260112

Claims (7)

  1. 1. A method for drowning detection by combining polarization multi-modes, comprising the steps of: S1, acquiring a polarization degree diagram and an intensity diagram in a swimming scene, wherein the polarization degree diagram and the intensity diagram are acquired based on Stokes vector method; S2, inputting the polarization degree diagram and the intensity diagram obtained in the step S1 into a pre-trained polarization fusion network for feature fusion to obtain a fusion image; the polarization fusion network comprises a feature extraction module, a feature fusion module and a feature reconstruction module; the characteristic extraction module adopts a double-flow encoder structure to respectively extract targeted characteristics of an intensity diagram and a polarization degree diagram, and specifically comprises the following steps: Capturing shallow edge characteristics through 3X 3 standard convolution, retaining brightness texture details, capturing deep characteristics through two parallel branches, wherein one branch comprises 5X 5 convolution and 3X 3 standard convolution which are sequentially connected, the other branch comprises 7X 7 standard convolution and 3X 3 standard convolution which are sequentially connected, performing characteristic splicing on output characteristics of the two branches, and then splicing with an original intensity graph to obtain a characteristic graph corresponding to the intensity graph; Extracting initial polarization characteristics from a polarization degree graph through 3X 3 standard convolution, capturing deep characteristics through two parallel branches, wherein one branch comprises a 5X 5 depth separable convolution and a 3X 3 standard convolution which are sequentially connected, the other branch comprises a 7X 7 depth separable convolution and a 3X 3 standard convolution which are sequentially connected, and splicing the output characteristics of the two branches, and then splicing the output characteristics with the original polarization degree graph to obtain a characteristic graph corresponding to the polarization degree graph; The two side branches of the double-flow encoder are both introduced with a dense connection mechanism, and each layer of characteristics are directly transmitted to all subsequent layers through jump connection; the feature fusion module adopts a combination structure of multi-scale processing and polarization guiding attention, and specifically comprises the following components: Processing the features output by the feature extraction module on 2-time scale and 4-time scale respectively, and fusing the up-sampling, convolution and down-sampling operations on the features of each scale; Respectively inputting the intensity characteristic map and the polarization degree characteristic map which are fused by multi-scale processing into a global maximum pooling layer and a global average pooling layer to obtain global statistical characteristics of the two; Introducing global statistical features of the polarization degree diagram into an attention generation branch of the intensity diagram, and obtaining a weight matrix capable of being adaptively adjusted through convolution layer learning Generating polarization-directed channel attention weight vectors Channel-level fusion of cross-modal characteristics is realized; Extracting gradient characteristics of polarization direction change through a convolution layer to generate a polarization degree diagram Is a spatial weight mask of (2) Masking the spatial weight Multiplying the channel weighted characteristics to realize space dimension self-adaptive fusion; the feature reconstruction module realizes feature reconstruction through three-layer convolution: a first layer 3 x 3 convolution for integrating the relevance of the cross-scale features and filtering redundant information; A second layer 3 x3 convolution for reinforcing local detail and global structure; the third layer 1 multiplied by 1 convolution is used for compressing channel dimension, mapping the high-dimensional feature into feature dimension matched with the input image and generating the fusion image; presetting a loss function Loss of structural similarity for multi-scale average Gradient loss And loss of strength Is a weighted sum of (2); s3, collecting and labeling videos and images of the simulated drowning scene shot by the polarization camera, and training the target detection network adapting to the polarization fusion characteristics as training data to obtain a trained target detection network; the object detection network includes: The input end module is used for executing preprocessing operation comprising image self-adaptive scaling and anchor frame self-adaptive configuration on the polarization fusion image, maintaining the target proportion and adapting to multi-scale detection; The feature extraction module is used for extracting multi-level and multi-scale visual features from the preprocessed polarization fusion image by adopting a hierarchical convolution network comprising a Focus structure, a CSP structure and an SPPF structure, wherein the Focus structure is used for enhancing contour feature extraction of limbs and floaters of a human body, the CSP structure is used for inhibiting water surface background interference, and the SPPF structure is used for capturing multi-scale context information; The feature fusion module adopts a bidirectional fusion structure of a feature pyramid network and a path aggregation network, wherein the feature pyramid network is used for transmitting deep semantic features output by the feature extraction module from top to bottom and fusing the deep semantic features with shallow detail features; the detection output module adopts a multi-scale prediction mechanism, predicts the target position, the category and the confidence level respectively from the different-scale feature images output by the feature fusion module, wherein the category comprises normal swimming and drowning states, and outputs a final detection result through non-maximum suppression; and S4, sending the fusion image obtained in the step S2 into the target detection network trained in the step S3, detecting drowning behavior, and judging whether the fusion image is in a drowning state.
  2. 2. The drowning detection method combining polarization multi-modes according to claim 1, wherein in step S1, a difference between right-handed and left-handed circular polarization intensities is set to 0, and a simplified form of the Stokes vector method is as follows: Wherein, the In order to provide an intensity map of the intensity, Is the difference between the polarized intensities in the 0 DEG and 90 DEG directions, Is the difference between the polarized intensities in the 45 DEG and 135 DEG directions, Is Stokes vector; the polarization degree diagram is a parameter for measuring the polarization degree of the light wave line, and the value range is 0-1.
  3. 3. The method for drowning detection in combination with polarization multimode according to claim 1, wherein in step S2, the pre-training process of the polarization fusion network comprises: The disclosed polarized image dataset is adopted as basic training data, and the basic training data is divided into a training set and a testing set; Splitting a polarization degree diagram and an intensity diagram in a training set into non-overlapping image blocks, and executing image enhancement operation on the image blocks; Training a polarization fusion network comprising a feature extraction module, a feature fusion module and a feature reconstruction module by using a training set after image enhancement operation, inputting a polarization degree diagram and an intensity diagram in a test set into the polarization fusion network in a training stage under the constraint condition of a preset loss function, and obtaining a fusion result corresponding to the test set until the fusion result meets the preset precision requirement, and then completing pre-training.
  4. 4. The drowning detection method combining polarization multi-modes according to claim 1, wherein in step S3, the input end module specifically comprises: The method comprises the steps of expanding sample distribution through diversified strategies, keeping the original proportion of swimmer targets, keeping fine boundary characteristics and attitude information, and dynamically adjusting the size of an anchor frame through counting and marking the boundary frame of the sample according to the real size distribution of the swimmer targets in training data.
  5. 5. The drowning detection method combining polarization multi-modes according to claim 1, wherein in step S3, the feature extraction module specifically comprises: The Focus structure is to split the pixels of the input image into four space positions of upper left, upper right, lower left and lower right, and to perform convolution processing after the number of channels is increased, so as to keep the body limb posture and the profile characteristics of floaters; the CSP structure is characterized in that the feature map is divided into two parts, one part executes conventional convolution operation, the other part is fused with the features of the previous part after passing through a residual path, and redundant information of the water surface background is filtered; SPPF structure, extracting multi-scale context information by combining different-scale max-pooling operations.
  6. 6. The drowning detection method combining polarization multimodality according to claim 1, characterized in that in step S3, the training process of the target detection network optimizes the target positioning accuracy using CIoU loss function, CIoU loss function The method comprises the following steps: Wherein, the As the cross-ratio loss function value, For the euclidean distance of the target frame from the predicted frame center point, For the distance of the diagonal of the target frame, Is a measure of aspect ratio.
  7. 7. A drowning detection system incorporating polarized multimodality, comprising: The acquisition module is used for acquiring a polarization degree diagram and an intensity diagram in a swimming scene, wherein the polarization degree diagram and the intensity diagram are acquired based on Stokes vector method; the network module is used for storing a pre-trained polarization fusion network, receiving the polarization degree image and the intensity image output by the acquisition module, and carrying out feature fusion through the polarization fusion network to generate a fusion image; the polarization fusion network comprises a feature extraction module, a feature fusion module and a feature reconstruction module; the characteristic extraction module adopts a double-flow encoder structure to respectively extract targeted characteristics of an intensity diagram and a polarization degree diagram, and specifically comprises the following steps: Capturing shallow edge characteristics through 3X 3 standard convolution, retaining brightness texture details, capturing deep characteristics through two parallel branches, wherein one branch comprises 5X 5 convolution and 3X 3 standard convolution which are sequentially connected, the other branch comprises 7X 7 standard convolution and 3X 3 standard convolution which are sequentially connected, performing characteristic splicing on output characteristics of the two branches, and then splicing with an original intensity graph to obtain a characteristic graph corresponding to the intensity graph; Extracting initial polarization characteristics from a polarization degree graph through 3X 3 standard convolution, capturing deep characteristics through two parallel branches, wherein one branch comprises a 5X 5 depth separable convolution and a 3X 3 standard convolution which are sequentially connected, the other branch comprises a 7X 7 depth separable convolution and a 3X 3 standard convolution which are sequentially connected, and splicing the output characteristics of the two branches, and then splicing the output characteristics with the original polarization degree graph to obtain a characteristic graph corresponding to the polarization degree graph; The two side branches of the double-flow encoder are both introduced with a dense connection mechanism, and each layer of characteristics are directly transmitted to all subsequent layers through jump connection; the feature fusion module adopts a combination structure of multi-scale processing and polarization guiding attention, and specifically comprises the following components: Processing the features output by the feature extraction module on 2-time scale and 4-time scale respectively, and fusing the up-sampling, convolution and down-sampling operations on the features of each scale; respectively inputting the intensity map feature map and the polarization degree map feature map which are fused by multi-scale processing into a global average pooling layer and a global maximum pooling layer to obtain global statistical features of the two; Introducing global statistical features of a polarization degree diagram into an attention generation branch of an intensity diagram through a learnable weight matrix Channel level fusion of cross-modal characteristics is realized, and a polarization-guided channel attention weight vector is generated ; Extracting gradient characteristics of polarization direction change through a convolution layer to generate a polarization degree diagram Is a spatial weight mask of (2) Masking the spatial weight Multiplying the channel weighted characteristics to realize space dimension self-adaptive fusion; the feature reconstruction module realizes feature reconstruction through three-layer convolution: a first layer 3 x 3 convolution for integrating the relevance of the cross-scale features and filtering redundant information; A second layer 3 x3 convolution for reinforcing local detail and global structure; the third layer 1 multiplied by 1 convolution is used for compressing channel dimension, mapping the high-dimensional feature into feature dimension matched with the input image and generating the fusion image; presetting a loss function Loss of structural similarity for multi-scale average Gradient loss And loss of strength Is a weighted sum of (2); the training module is used for collecting and marking videos and images of the simulated drowning scene shot by the polarization camera, and training the target detection network adapting to the polarization fusion characteristic by taking the videos and the images as training data to obtain a trained target detection network; the object detection network includes: The input end module is used for executing preprocessing operation comprising image self-adaptive scaling and anchor frame self-adaptive configuration on the polarization fusion image, maintaining the target proportion and adapting to multi-scale detection; The feature extraction module is used for extracting multi-level and multi-scale visual features from the preprocessed polarization fusion image by adopting a hierarchical convolution network comprising a Focus structure, a CSP structure and an SPPF structure, wherein the Focus structure is used for enhancing contour feature extraction of limbs and floaters of a human body, the CSP structure is used for inhibiting water surface background interference, and the SPPF structure is used for capturing multi-scale context information; The feature fusion module adopts a bidirectional fusion structure of a feature pyramid network and a path aggregation network, wherein the feature pyramid network is used for transmitting deep semantic features output by the feature extraction module from top to bottom and fusing the deep semantic features with shallow detail features; the detection output module adopts a multi-scale prediction mechanism, predicts the target position, the category and the confidence level respectively from the different-scale feature images output by the feature fusion module, wherein the category comprises normal swimming and drowning states, and outputs a final detection result through non-maximum suppression; and the drowning module is used for inputting the fusion image into the target detection network to detect drowning behavior and judging whether the fusion image is in a drowning state.

Description

Drowning detection method and system combining polarization multi-modes Technical Field The invention belongs to the technical field of drowning detection, and particularly relates to a drowning detection method and system combining polarization multi-modes. Background Drowning detection means that drowning behaviors or risks of individuals in water are identified through technical means so as to take rescue measures in time. With the rapid development of computer vision and artificial intelligence technology, a drowning detection system based on deep learning has been developed. These systems typically identify abnormal drowning behavior in real time by analyzing the image or video data of the monitoring camera. Compared with the traditional method, the drowning detection system based on artificial intelligence has the advantages of high real-time performance, wide coverage range, difficult fatigue and the like, and is widely applied to places such as swimming pools and the like to realize intelligent drowning detection and alarm. The existing drowning detection method based on video processing mainly carries out target detection on swimmers, and then tracks and analyzes whether the swimmers act in a drowning state. Early machine vision drowning detection systems typically tracked the motion profile of a swimmer in a swimming pool in real time by placing infrared and (RGB) cameras over the pool and on the pool wall. The system judges whether a drowning accident is likely to happen or not by analyzing the stay time of the swimmer at the water bottom, and gives an alarm in time. The existing method for detecting underwater human body by using mean background modeling method is used, wherein in order to reduce underwater noise, a wavelet threshold denoising algorithm and a Retinex algorithm are combined to realize image enhancement. There is also a system for tracking a swimmer by converting an RGB image into a (Hue Saturation Value, HSV) color space, and combining an a priori thresholding mechanism with a contour detection method. When the system detects that the profile of the swimmer has disappeared for a period of time on the water surface, an alarm is triggered. The method is complex to manually calculate a threshold value each time, and the accuracy is poor due to the fact that the contour is only used for judging. And the accuracy of detection can be greatly influenced when the water surface condition is complex. A camera-based system is known for detecting drowning events in swimming pools at as early a stage as possible. The system consists of a vision module and an event reasoning module. The vision module employs a model-based approach to represent and distinguish between background pool areas and front Jing You swimmers. The event inference module is constructed based on a finite state machine that integrates a plurality of inference rules formulated based on the general motion characteristics of drowners. A sequential change detection algorithm is used to quickly detect a possible drowning event. The system is applied to a plurality of drowned video clips and achieves good effects. There is an outdoor swimming pool real-time vision system. The system provides a set of methods such as background reduction, denoising, data fusion, spot segmentation and the like aiming at the characteristics of water body background and pool crowded scenes. In the step of detecting a drowning event, visual indicators of distress and drowning are incorporated by a set of foreground descriptors. And a module comprising data fusion and hidden markov modeling is designed to learn the unique features of different swimming behaviors. Another technique for early prediction of drowning based on the new equation (New Equation-based Prediction Technique for EARLY NEAR-Drowning Events, NEPTUNE). The formulas and rules used by neptene enable near drowning to be detected using video sequences of at least 1 second but not more than 5 seconds with few false positives. The backbone of neptene includes a mix of statistical image processing for merging images of video sequences, then using K-means clustering (K-means Clustering, kmeans) to extract segments in the merged images, and finally revisiting the statistical image processing to derive variables for each segment that will be used to determine drowning. In addition, the method also comprises a video human body region extraction method based on drowning gesture features, and a drowning detection algorithm based on the difference between a convolution self-encoder and bottom layer features is designed on the basis of the method. The characteristic of a normal swimmer is reconstructed and modeled through the convolution self-encoder, the characteristic of the swimmer is reconstructed according to the trained encoder during detection, and whether the swimmer is in a drowned state is judged according to the input and the reconstructed characteristic error. A drowning detec