CN-122023996-A - Multi-mode large model construction method and system for oil field inspection multisource vision and sensing data fusion

CN122023996ACN 122023996 ACN122023996 ACN 122023996ACN-122023996-A

Abstract

The invention provides a multi-mode large model construction method and system for fusion of multi-source vision and sensing data in oilfield inspection, and belongs to the field of oilfield pipeline inspection. The problem of single data source recognition accuracy is not enough exists in current oilfield pipeline inspection technology is solved. The method comprises the steps of loading or generating a pairing data set containing oilfield pipeline visual images and corresponding sensing time sequence data, carrying out deep semantic feature extraction on the visual images, carrying out time sequence associated feature extraction on the sensing time sequence data, mapping the time sequence associated feature extraction to unified feature dimensions to obtain visual feature vectors and sensing feature vectors, carrying out bidirectional attention interaction on the visual feature vectors and the sensing feature vectors based on a multi-head self-attention mechanism to obtain fused multi-mode features, inputting the fused multi-mode features into a multi-modal large model, optimizing by adopting a lightweight fine adjustment strategy, carrying out classification recognition on oilfield pipeline operation states based on the depth features of the large model, and outputting classification results. The method is used in the field of oilfield safety.

Inventors

LV JIALIANG
WANG CHAO
CHANG LIANG

Assignees

大庆安瑞达科技开发有限公司

Dates

Publication Date: 20260512
Application Date: 20260403

Claims (10)

1. The method for constructing the multi-mode large model by fusing the multi-source vision and the sensing data in the oilfield inspection is characterized by comprising the following steps: Initializing and configuring parameters, namely configuring visual image size, sensing time sequence length, feature fusion dimension and model training super parameters; The preparation and loading step of dual-source data, namely loading or generating a pairing data set containing oilfield pipeline visual images and corresponding sensing time sequence data, and carrying out standardized processing on the data; The single-mode feature extraction step is that deep semantic feature extraction is carried out on the visual image, time sequence associated feature extraction is carried out on the sensing time sequence data, and the sensing time sequence data are mapped to unified feature dimensions respectively to obtain visual feature vectors and sensing feature vectors; a cross-modal feature fusion step, namely based on a multi-head self-attention mechanism, performing bidirectional attention interaction on the visual feature vector and the sensing feature vector to obtain a fused multi-modal feature; Inputting the multi-modal characteristics into a pre-trained universal multi-modal large model, and adopting a lightweight fine tuning strategy to perform fine tuning of the universal multi-modal large model to adapt to an oilfield inspection scene so as to obtain optimized large model depth characteristics; And a downstream task classification step, namely carrying out classification and identification on the running state of the oilfield pipeline based on the depth characteristics of the large model, and outputting classification results.
2. The method for constructing the multi-mode large model by fusion of multi-source vision and sensing data for oilfield inspection according to claim 1, wherein the step of preparing and loading the dual-source data further comprises self-adaptive generation of synthetic data and standardized loading of real data; the self-adaptive generation of the synthesized data is used for generating a pairing sample of the visual image and the sensing time sequence data, which are matched with the normal state, the leakage risk and the equipment abnormal state of the oilfield pipeline, according to a preset pipeline risk label; the real data standardized loading is used for carrying out size standardization processing on the acquired real visual images and carrying out length normalization and standardization processing on the real sensing time sequence data.
3. The method for constructing the multi-modal large model by fusion of multi-source vision and sensing data for oilfield inspection according to claim 1, wherein the step of extracting the time-series correlation features of the sensing time-series data comprises the steps of: Extracting time sequence related characteristics of the standardized sensing time sequence data by using a time sequence convolution network; and carrying out nonlinear mapping and dimension transformation on the time sequence associated features through a multi-layer perceptron to obtain the sensing feature vector.
4. The method for constructing the multi-modal large model by fusion of multi-source vision and sensing data for oilfield inspection according to claim 1, wherein the step of cross-modal feature fusion is specifically as follows: Respectively carrying out layer normalization processing on the input visual feature vector and the sensing feature vector; Performing bi-directional cross-modal multi-headed self-attention interactions, comprising: Taking the normalized visual feature vector as a query, and taking the normalized sensing feature vector as a key and a value, calculating sensing attention features; taking the normalized sensing feature vector as a query, and taking the normalized visual feature vector as a key and a value, calculating visual attention features; residual connection is carried out on the visual attention features and the original visual feature vectors, and residual connection is carried out on the sensing attention features and the original sensing feature vectors; and splicing the two feature vectors after residual connection, and carrying out nonlinear fusion on the spliced features to output the multi-modal features.
5. The method for constructing the multi-modal large model by fusion of the multi-source vision and the sensing data for oilfield inspection according to claim 1, wherein the lightweight fine tuning strategy is to freeze the weight of a bottom backbone network of the general multi-modal large model, and only perform training update on parameters of an adaptation layer adapted to the multi-modal feature input, a top feature coding layer of the general multi-modal large model and a downstream classification task layer.
6. The method for constructing the multi-mode large model by fusion of multi-source vision and sensing data for oilfield inspection according to claim 1, wherein in the downstream task classification step, a predicted value output by a classification head is processed through a Softmax function, and the probability that a pipeline is in a normal state, a leakage risk or an equipment abnormal state is calculated and used as the classification result.
7. A multi-modal large model system for oilfield inspection multisource vision and sensory data fusion, wherein the system is configured to implement the method of any one of claims 1-6, the system comprising: the parameter configuration module is used for configuring system initialization parameters and model training super parameters; The dual-source data processing module is used for loading or generating a standardized pairing data set containing visual images and sensing time sequence data; the single-mode feature extraction module comprises a visual feature extraction sub-module and a sensing time sequence feature extraction sub-module which are respectively used for extracting visual depth semantic features and sensing time sequence related features and mapping the visual depth semantic features and the sensing time sequence related features to a unified dimension; The cross-modal fusion module is used for carrying out depth interaction and fusion on the visual depth semantic features and the sensing time sequence associated features based on a multi-head self-attention mechanism; the oilfield scene large model fine adjustment module is used for inputting the fused features into the universal multi-mode large model, performing light fine adjustment and outputting depth features after scene optimization; and the downstream task module is used for classifying the pipeline running state based on the depth characteristic.
8. The multi-mode large model system for the integration of multi-source vision and sensing data for oilfield inspection according to claim 7, wherein the sensing time sequence feature extraction submodule comprises a time sequence convolution network and a multi-layer perceptron which are sequentially connected, the time sequence convolution network is used for extracting time sequence associated features, and the multi-layer perceptron is used for carrying out nonlinear mapping and dimension unification on the time sequence associated features.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes a multi-mode large model construction method for integrating oilfield inspection multi-source vision and sensing data according to any one of claims 1-6.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of a multi-modal large model construction method of fusion of oilfield inspection multisource vision and sensing data as defined in any one of claims 1 to 6.

Description

Multi-mode large model construction method and system for oil field inspection multisource vision and sensing data fusion Technical Field The invention belongs to the field of oilfield pipeline inspection, and particularly relates to a multi-mode large model construction method for oilfield inspection multi-source vision and sensing data fusion. Background The oil field pipeline is used as a core artery for oil and gas production gathering and transportation, and the safe and stable operation of the oil field pipeline is important in guaranteeing the safe production of the oil field, preventing leakage pollution and reducing production accidents. Along with the penetration of artificial intelligence technology in the industrial field, intelligent inspection technology based on deep learning has become an important development direction for improving the inspection efficiency of oil fields. However, under the actual field complex environment and high reliability requirements of the oil field, the prior art still has a series of core defects to be solved urgently, and restricts the large-scale landing and application effects of the technical proposal. First, at the data utilization level, the existing scheme has the problem of single utilization mode. Most technical schemes rely on a single data source to make decisions, for example, visible light and infrared visual images are adopted to identify apparent defects such as corrosion, deformation, leakage and seepage outside a pipeline, or sensing time sequence data such as temperature, humidity, pressure and vibration are relied on to make threshold alarming so as to judge that internal operation parameters are abnormal. This analysis mode of visual and sensory data cracking results in risk judging dimension facets. For example, small structural fatigue of a pipeline may not yet cause obvious visual deformation, but has caused small changes in the vibration spectrum, and is extremely easy to miss by a visual scheme, whereas certain non-risky environmental disturbances, such as visual artifacts caused by light shadows, may cause false alarms of a single visual model. The lack of collaborative analysis and fusion judgment on the multi-source heterogeneous data of the appearance form and the internal state of the pipeline is one of the root causes of insufficient comprehensiveness, accuracy and reliability of the conventional scheme identification. Secondly, feature mining depth for the sensing time series data is seriously insufficient. At present, the processing of sensor data such as temperature, humidity, pressure, vibration and the like still stays in the traditional threshold comparison, simple statistical analysis or alarm stage based on fixed rules. The method can only capture obvious and steady-state numerical value out-of-range of the data, and can not effectively mine long-period associated features, dynamic mutation modes and trend changes which are contained in the time sequence data and are related to early weak faults. For example, for slow pressure drop caused by pipeline micro leakage or gradual change of vibration energy in a specific frequency band caused by initial mechanical loosening, the traditional method has low sensitivity and serious early warning lag, and is difficult to meet the active prevention and control requirements of pre-prevention of safe production. Furthermore, even a few studies attempt to perform multi-source data fusion, the fusion approach tends to be shallow. Common practices such as early feature stitching or late decision weighting have only performed simple feature stacking or weighted averaging at the result level, failing to establish deep interaction and association mechanisms between visual and sensing modality features. The shallow fusion can not lead the model to learn the inherent causal or concomitant relation between the form of a certain rusted plaque and a specific pressure fluctuation mode independently, so that the feature redundancy after fusion is high, the complementarity is poor, and the fusion effect is improved only to a limited extent. In addition, direct migration application of a general multi-modal large model also faces serious scene suitability problems. These generic models are usually trained on internet-scale graphic-text pair data, and their underlying visual concepts and semantic relationships are greatly different from dedicated scenarios for oilfield inspection, such as christmas tree, gathering pipeline, flange leakage, insulation layer breakage, etc. The direct application can lead to low accuracy of the model in identifying the special abnormal types of the oil field, and the model shows poor robustness under the extreme conditions of complex illumination change, rain, snow, haze shielding, vegetation interference and the like in field inspection, so that the requirements of high availability of the industrial field are difficult to meet. If the general large model is subjected to f