CN-121999338-A - Pipeline oil leakage detection method and system based on three-mode physical-visual cooperation

CN121999338ACN 121999338 ACN121999338 ACN 121999338ACN-121999338-A

Abstract

The invention provides a method and a system for detecting oil leakage of a pipeline based on tri-modal physical-visual coordination, and relates to the technical field of pipeline safety detection. The method comprises the steps of obtaining visible light images, infrared images and laser echo characteristic sequences of a pipeline area to be detected, inputting a pipeline oil leakage detection model comprising a shared parameter main network, one-dimensional convolution branches, a cross-modal interaction fusion module and a transform decoder after spatial pixel level alignment and normalization processing, extracting visible light and infrared image characteristics from the main network, extracting laser spectrum characteristics from the one-dimensional convolution branches, fusing three-modal characteristics through three-dimensional attention by the cross-modal interaction fusion module, constructing a characteristic pyramid, decoding the characteristic pyramid by the transform decoder, and outputting an oil leakage detection result. According to the invention, through introducing laser physical characteristics and visual mode depth fusion, the collaborative detection of substance attribute discrimination and visual positioning of an oil leakage target is realized, and the accuracy and the robustness of pipeline oil leakage detection in a complex environment are improved.

Inventors

Ning Jiecheng
ZHANG XI
TANG XIAOYAN
HUANG ZHONG
WANG XINGXING
YE SIYI
XIA SHUJUN

Assignees

四川华能太平驿水电有限责任公司
四川九洲北斗导航与位置服务有限公司

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (10)

1. The method for detecting the oil leakage of the pipeline based on the three-mode physical-visual cooperation is characterized by comprising the following steps of: Acquiring three-mode data of a pipeline area to be detected, wherein the three-mode data comprise visible light images, infrared images and laser echo characteristic sequences; Inputting three-mode data of a pipeline area to be detected into a pre-constructed and trained pipeline oil leakage detection model, wherein the pipeline oil leakage detection model comprises a main network, a one-dimensional convolution branch, a cross-mode interaction fusion module and a transducer decoder which share parameters, and the three-mode interaction fusion module comprises the following components: The system comprises a main network with shared parameters, a one-dimensional convolution branch, a cross-modal interaction fusion module and a cross-modal feature pyramid, wherein the main network is used for extracting multi-scale features of a visible light image and an infrared image, the one-dimensional convolution branch is used for encoding a laser echo feature sequence and outputting a laser spectrum feature vector; the Transformer decoder is used for decoding the cross-modal feature pyramid to obtain an oil leakage detection result.
2. The method for detecting oil leakage in a pipeline based on tri-modal physical-visual coordination according to claim 1, wherein the method for detecting oil leakage in a pipeline, which is constructed and trained in advance, comprises the steps of: Scaling the visible light image and the infrared image to a preset size; Projecting the space coordinates of the laser detection points corresponding to the laser echo feature sequences to an image coordinate system through a pre-calibrated external reference matrix to obtain pixel coordinates of each laser detection point on the image; converting RGB channel values of the visible light image into a [0,1] interval through linear mapping; Transforming the original temperature value or the radiation intensity value of each pixel in the infrared image to a [0,1] interval through linear mapping; And carrying out dimension normalization processing on the laser echo characteristic sequence to obtain a normalized laser echo sequence.
3. The method for detecting oil leakage in a pipeline based on tri-modal physical-visual coordination according to claim 1, further comprising employing a tri-stage strategy of modal isolation training, substance attribute alignment training and multi-modal fusion training in a training stage of a pipeline oil leakage detection model, wherein: the method comprises the steps that a visible light image dataset, an infrared image dataset and a laser echo characteristic sequence dataset are respectively used in a modal isolation training stage, a backbone network, a one-dimensional convolution branch and a transducer decoder which share parameters are independently trained, and a weighted sum of classification loss and position regression loss is adopted as a loss function during training; In the substance attribute alignment training stage, paired visible light images, infrared images and corresponding laser spectrum feature vectors are used as training data, parameters of one-dimensional convolution branches are fixed, visible light high-level features and infrared high-level features extracted from a main network of shared parameters are respectively calculated to be cosine similarity losses with the laser spectrum feature vectors, and parameters of the main network of the shared parameters are optimized through the cosine similarity losses; And in the multi-modal fusion training stage, performing end-to-end training on a complete pipeline oil leakage detection model comprising a cross-modal interaction fusion module by using paired visible light images, infrared images and laser echo characteristic sequences, and performing parameter optimization by adopting a composite loss function, wherein the composite loss function comprises classification loss, position regression loss, spectrum physical consistency loss and cross-modal distributed alignment loss, the classification loss is focus loss, the position regression loss is a linear combination of L1 loss and generalized cross-co-ratio loss, the spectrum physical consistency loss is cosine similarity of image characteristics and laser spectrum characteristics in a detection frame, and the cross-modal distributed alignment loss is KL divergence among visible light, infrared and laser modal characteristic distribution.
4. The method for detecting the oil leakage of the pipeline based on the tri-modal physical-visual coordination according to claim 1, wherein the cross-modal interaction fusion module fuses the extracted features of the tri-modal data through a tri-modal attention mechanism, and the method comprises the following steps: The highest-level visible light characteristic diagram and the highest-level infrared characteristic diagram in the multi-scale characteristics extracted by the main network are respectively flattened and converted into a visible light sequence and an infrared sequence along the space dimension; The laser physical sequence is used as inquiry, and the laser physical sequence, the visible light sequence and the infrared sequence are respectively subjected to linear transformation capable of learning to generate inquiry vectors, key vectors and value vectors, specifically: When the visible light-infrared attention is calculated, the dot product attention is scaled by the visible light inquiry, the infrared key and the infrared value which are fused with the laser physical sequence information, and the visible light-infrared attention output is obtained; when the infrared-visible light meaning force is calculated, scaling dot product attention is carried out on the infrared query, the visible light key and the visible light value which are fused with the laser physical sequence information, and infrared-visible light meaning force output is obtained; adding visible light-infrared attention output and infrared-visible light attention output element by element, then adding the elements with a laser physical sequence, and obtaining the fused multi-scale characteristics through remolding operation.
5. The method for detecting oil leakage of a pipeline based on tri-modal physical-visual coordination according to claim 4, wherein the construction of the cross-modal feature pyramid based on the fused features is specifically as follows: And performing top-down feature fusion by taking the highest-layer feature in the fused multi-scale features as a starting point, wherein the method comprises the following steps: The method comprises the steps of obtaining a multi-scale feature, carrying out up-sampling on the multi-scale feature after fusion to obtain an up-sampling feature, adding a corresponding middle visible light feature image extracted by a main network with an infrared feature image element by element to obtain a middle visual fusion feature, carrying out leavable weighted fusion on the up-sampling feature and the middle visual fusion feature, carrying out convolution and batch normalization treatment to obtain a weighted fusion feature of a current level, repeating up-sampling, weighted fusion and convolution normalization operation, and carrying out fusion with a lower visible light feature image and an infrared feature image extracted by the main network in sequence until the lowest layer is processed, and obtaining a top-down weighted fusion feature set; The method comprises the steps of obtaining a top-down path, obtaining a lowest layer weighted fusion feature, carrying out downsampling to the lowest layer weighted fusion feature obtained in the top-down path to obtain a downsampled feature, carrying out leavable weighted fusion to the downsampled feature and the top-down weighted fusion feature of a corresponding level, carrying out convolution and batch normalization processing after fusion to obtain a current layer feature of the bottom-up path, continuing downsampling to the current layer feature, carrying out leavable weighted fusion to the current layer feature and the top-down weighted fusion feature of a previous level, and repeating downsampling and weighted fusion operation until all feature levels are completed, thus obtaining a bottom-up cross-mode feature pyramid.
6. The method for detecting the oil leakage of the pipeline based on the tri-modal physical-visual coordination according to claim 5 is characterized in that in the learnable weighted fusion, weight parameters used for fusion are learnable parameters optimized through gradient back propagation in the training process of the pipeline oil leakage detection model, and when the pipeline oil leakage detection model is inferred, fusion proportion of each modal characteristic is obtained through calculation together by the weight parameters and the characteristics of the tri-modal data, and the fusion proportion is dynamically determined according to the brightness average value of a visible light image, the gradient amplitude value of an infrared image and the response intensity of a laser spectrum characteristic vector in the tri-modal data.
7. The method for detecting oil leakage in pipeline based on tri-modal physical-visual coordination as set forth in claim 1 wherein the transducer decoder is formed by stacking a plurality of decoding layers in order to The specific process of decoding the cross-modal feature pyramid by a transducer decoder is as follows: For each decoding layer, will Multi-head self-attention module for inputting individual object query vectors and calculating Self-attention among query vectors, outputting updated A plurality of query vectors; will be updated The query vectors are used as queries, feature sequences formed by splicing feature graphs of all layers in the cross-modal feature pyramid after flattening are used as keys and values, the keys and the values are input into a multi-head cross-attention module, attention weights of all position features in each query and the feature sequences are calculated, and the feature sequences are weighted and summed according to the attention weights to obtain an aggregate feature vector corresponding to each query; residual connection and layer normalization are carried out on the aggregated feature vector and the query vector before being input into the cross-attention module, so as to obtain the output of the decoding layer A plurality of query vectors; Through the process of After stacking the decoding layers, the first layer Layer output The query vectors are respectively input into a linear classification layer and a linear regression layer, wherein the linear classification layer outputs the target class probability corresponding to each query, and the linear regression layer outputs the target boundary frame coordinates and the substance judgment confidence corresponding to each query.
8. A three-modality physical-visual synergy-based pipe oil leakage detection system for performing the three-modality physical-visual synergy-based pipe oil leakage detection method of any one of claims 1 to 7, the system comprising: The three-mode active sensing hardware module is used for synchronously collecting visible light images, infrared images and laser echo characteristic sequences of a pipeline area to be detected; The multi-source data preprocessing engine is connected with the three-mode active sensing hardware module and is used for carrying out spatial pixel level alignment and normalization processing on the received visible light image, infrared image and laser echo characteristic sequence and outputting aligned and normalized three-mode data; The three-way fusion oil leakage detection core is connected with the multi-source data preprocessing engine, a pipeline oil leakage detection model is arranged in the multi-source data preprocessing engine, and the pipeline oil leakage detection model comprises a shared parameter backbone network, a one-dimensional convolution branch, a cross-modal interaction fusion module, a cross-modal feature pyramid construction module and a transducer decoder, and is used for outputting an oil leakage detection result according to input aligned and normalized three-mode data; and the dynamic feedback and early warning module is connected with the three-way fusion oil leakage detection core and is used for triggering an early warning signal and displaying a fusion image according to the oil leakage detection result.
9. The three-mode physical-visual synergy-based pipeline oil leakage detection system of claim 8, wherein the pipeline oil leakage detection model outputs oil leakage detection results according to the input aligned normalized three-mode data specifically: The method comprises the steps of extracting multiscale characteristics of a visible light image and an infrared image in parallel through a backbone network sharing parameters, outputting a visible light characteristic image and an infrared characteristic image of a plurality of layers, carrying out coding processing on a normalized laser echo sequence through a one-dimensional convolution branch, and outputting a laser spectrum characteristic vector; Receiving a highest-level visible light characteristic image, an infrared characteristic image and a laser spectrum characteristic vector output by the one-dimensional convolution branch which are output by the shared parameter backbone network through a cross-modal interaction fusion module respectively connected with the shared parameter backbone network and the one-dimensional convolution branch, and fusing the visible light characteristic, the infrared characteristic and the laser spectrum characteristic through a three-dimensional attention mechanism to output fused multi-scale characteristics; The method comprises the steps of receiving each level of visible light characteristic image, infrared characteristic image and fused high-level characteristic output by a cross-modal interaction fusion module through a cross-modal characteristic pyramid construction module respectively connected with a shared parameter backbone network and the cross-modal interaction fusion module, and constructing and outputting a cross-modal characteristic pyramid through a top-down learnable weighted fusion path and a bottom-up learnable weighted fusion path; Receiving the cross-modal feature pyramid through a transducer decoder connected with the cross-modal feature pyramid construction module, stacking the multi-layer decoding layers, sequentially executing multi-head self-attention calculation, multi-head cross-attention calculation taking the cross-modal feature pyramid as a key value, residual connection and layer normalization by each layer decoding layer, and outputting the category probability, the boundary frame coordinates and the substance judgment confidence of each target prediction through a linear classification layer and a linear regression layer; and triggering an early warning signal according to a comparison result of the substance judgment confidence coefficient and a preset threshold value by connecting a dynamic feedback and early warning module of the three-way fusion oil leakage detection core, and displaying a fused pseudo-color enhanced image.
10. The three-modality physical-visual synergy-based pipeline oil leakage detection system of claim 8, further comprising a model training module coupled to the three-way fusion oil leakage detection core for performing training of a pipeline oil leakage detection model, comprising: The system comprises a modal isolation training unit, a position regression unit and a position regression unit, wherein the modal isolation training unit is configured to respectively train a backbone network, a one-dimensional convolution branch and a transducer decoder which share parameters by using a single-modal visible light image data set, an infrared image data set and a laser echo characteristic sequence data set, and adopts a weighted sum of classification loss and position regression loss during training; the substance attribute alignment training unit is configured to use paired visible light images, infrared images and corresponding laser spectrum feature vectors as training data, fix parameters of one-dimensional convolution branches, constrain visual features extracted from a backbone network of shared parameters to approach the laser spectrum features by cosine similarity loss, and optimize backbone network parameters; The multi-mode fusion training unit is configured to perform end-to-end training on a complete pipeline oil leakage detection model comprising a cross-mode interaction fusion module by using paired visible light images, infrared images and laser echo characteristic sequences, and optimize all parameters by adopting a composite loss function, wherein the composite loss function comprises classification loss, position regression loss, spectrum physical consistency loss and cross-mode distributed alignment loss.

Description

Pipeline oil leakage detection method and system based on three-mode physical-visual cooperation Technical Field The invention relates to a method and a system for detecting oil leakage of a pipeline based on tri-modal physical-visual coordination, and belongs to the technical field of pipeline safety detection. Background The detection of oil leakage of a pipeline is one of key technologies for guaranteeing energy transportation safety and environmental safety. The traditional detection method is mostly dependent on manual inspection or pressure sensor monitoring, and has the problems of low efficiency, large potential safety hazard, difficulty in accurately positioning tiny leakage points and the like. Visual detection technology based on visible light and infrared thermal imaging provides a new approach for solving the problems. However, because the visible light image is easily affected by illumination change, shadow shielding and other environmental influences, the infrared image is limited by the conditions of complicated ground surface heat radiation background, weak temperature difference and the like, so that the detection accuracy of a single visual mode is insufficient in the scenes of 'foreign matter identical spectrum' (such as accumulated water, shadow is misjudged as oil stain) and 'weak feature recognition' (such as early micro leakage). In addition, the existing bimodal fusion method is mostly shallow or static fusion, so that deep interaction and self-adaptive fusion of cross-modal information are difficult to realize, and robustness is poor under complex working conditions. The laser active detection technology can realize component level identification by analyzing characteristic echoes of substance molecules, can provide a discrimination basis for physical attribute level for visual detection, but has limited detection range and lacks space geometric information when being used independently. Therefore, how to deeply fuse the physical attribute identification capability of the laser and the spatial semantic representation capability of the image enables the detection model to have both visual perception capability and material identification capability, and becomes a key for improving the accuracy and the robustness of the oil leakage detection of the pipeline. In the prior art, as disclosed in the chinese patent application publication No. CN119941606a, a multi-mode and GLCM-based pipeline oil leakage detection method is implemented by obtaining a multi-spectrum image and a common RGB image, extracting texture features of the multi-spectrum image by using a gray level co-occurrence matrix (GLCM), and then performing simple superposition fusion with feature images of the RGB image at a channel level, and finally performing target detection. The technical scheme utilizes multisource information to a certain extent, but only relies on visual features of visible light and infrared images, interference objects similar to the visual features of oils cannot be distinguished from physical essential attribute aspects, false detection is easy to occur under complex working conditions, the characteristic fusion mode is only channel splicing, static linear fusion is not achieved, nonlinear interaction between physical dimension and visual dimension cannot be achieved, effective screening of suspected areas is difficult to achieve when visual features are fuzzy, in addition, modal bias is easy to occur in the training process, namely strong signal features of a certain mode are excessively relied, contribution weights of different modes cannot be adaptively adjusted according to environmental changes, and robustness of a detection system is insufficient. In view of the foregoing, there is a need for a pipeline oil leakage detection method that can realize three-mode depth coordination of visible light-infrared-laser, has physical consistency constraint, and can dynamically adjust the mode weight according to the environment, so as to realize accurate and robust detection of an oil leakage target in a complex industrial environment. Disclosure of Invention The invention provides a method and a system for detecting oil leakage of a pipeline based on three-mode physical-visual synergy, aiming at solving the problems that in the prior art, cross-mode interaction is shallow, mode characterization is unbalanced, the detection precision of a weak feature target is low, foreign matter isospectrums (such as accumulated water misjudgment) cannot be effectively distinguished and the like. The technical scheme of the invention is as follows: In one aspect, the invention provides a method for detecting oil leakage of a pipeline based on three-mode physical-visual coordination, which comprises the following steps: Acquiring three-mode data of a pipeline area to be detected, wherein the three-mode data comprise visible light images, infrared images and laser echo characteristic sequences; Inputting three-mode dat