Search

CN-122019784-A - Construction method of large oil gas production model platform

CN122019784ACN 122019784 ACN122019784 ACN 122019784ACN-122019784-A

Abstract

The invention discloses a construction method of a large oil gas production model platform, which relates to the technical field of oil gas production intellectualization, and comprises the following steps of pre-constructing an oil gas field knowledge graph, automatically extracting entities and attributes of massive unstructured documents in oil gas production by adopting a BERT-BiLSTM-CRF model, and using an extraction result for filling and perfecting the knowledge graph; an WellGPT model focusing on expertise reasoning and a WellGPT-VL model focusing on visual fusion understanding are respectively constructed. According to the invention, by means of Qwen2.5 double-base and dynamic weight mixing LoRA technology based on intent recognition, through accurate classification of intent recognition scheduling centers and special training of independent LoRA adapters, efficient fine adjustment of a universal base model is realized, the generalization capability of the universal model is reserved, expert knowledge of oil and gas industry is deeply injected, technical terms, physical laws and multi-mode data characteristics in the field can be accurately understood, and professional suitability and output reliability of the model in an oil and gas production scene are remarkably improved.

Inventors

  • SONG JIAN
  • Niu Huizhao
  • SUN HAITONG
  • Tan zhunan
  • ZHANG CHUNHAO
  • LI LOULOU
  • LI ZHAOBIN

Assignees

  • 北京雅丹石油技术开发有限公司

Dates

Publication Date
20260512
Application Date
20251220

Claims (6)

  1. 1. The construction method of the oil gas production large model platform is characterized by comprising the following steps of: Pre-constructing a knowledge graph in the oil and gas field, automatically extracting entities and attributes of massive unstructured engineering documents in oil and gas production by adopting a BERT-BiLSTM-CRF model, and using an extraction result for filling and perfecting the knowledge graph; Based on Qwen2.5-7B and Qwen2.5-VL-7B double bases, performing efficient fine adjustment of parameters by adopting a dynamic weight mixing LoRA technology based on intention recognition, and respectively constructing a WellGPT model focusing on professional knowledge reasoning and a WellGPT-VL model focusing on vision fusion understanding; The method comprises the steps of constructing a five-layer technical architecture comprising a data layer, a platform layer, a model layer, a tool layer and a workflow layer, wherein all layers work cooperatively to realize full-link operation from data support to model cooperation and then to scene application, the data layer uses a knowledge graph as a core to construct a data base to realize multi-source data fusion, the platform layer uses an open source tool as a technical support, relies on a Neo4j database and LLaMA-Factory, python to construct a basic technical framework, the model layer carries out scene fine tuning on Qwen series models, takes the fine-tuned large model as a summary scheduling center, the tool layer builds a plurality of field small models in the form of FastAPI to serve as tool reserves, and the field small models comprise a numerical simulation small model, a dynamic productivity prediction small model and an equipment fault early warning small model; And arranging cooperative workflow of the large model and the professional small model through a Dify platform, and adapting to different oil gas production service scenes to form a customized application scheme.
  2. 2. The method for constructing a large oil and gas production model platform according to claim 1, wherein the method for constructing the oil and gas field knowledge graph adopts a mixed field ontology construction method, and comprises the following steps: constructing a mode layer from top to bottom, taking oil and gas field development whole-flow business logic as a context, referring to industry technical standards, professional teaching materials and field theoretical systems, constructing a top layer body, and sequentially subdividing a secondary sub-field and a tertiary core entity in five categories including oil reservoir, well drilling, well completion, oil extraction and support; Constructing an entity layer from bottom to top, extracting a four-level concrete example and a semantic relation from an engineering manual and a fault case actual data source, and filling pattern layer frame details; Entity data specification, combining with oil and gas field development industry characteristics, optimizing entity granularity, perfecting cross-domain relation, and correcting deviation by domain expert review.
  3. 3. The method for constructing a large oil and gas production model platform according to claim 1, wherein the BERT-BiLSTM-CRF model comprises a BERT module, a Bi-LSTM module and a CRF module, wherein; the BERT module is used for converting an input text sequence into a word vector sequence fused with multi-dimensional information, wherein the word vector sequence is formed by adding elements embedded by words, embedded by positions and embedded by sentence information; The Bi-LSTM module is used for further modeling the vector sequence output by the BERT, capturing the Bi-directional context dependency relationship of the sequence and outputting the characteristic representation containing global semantics; The CRF module is used for learning the transition probability among the labels based on the output of the Bi-LSTM and outputting the optimal entity label sequence by restricting the validity of the label sequence.
  4. 4. The method for constructing a large oil and gas production model platform according to claim 1, wherein the dynamic weight mixing LoRA technology based on intention recognition performs parameter efficient fine tuning, and the method comprises the following steps: Constructing an intention recognition scheduling center, constructing an intention recognition module based on a lightweight TextCNN network, converting a discrete instruction text into a vector matrix by the TextCNN network, and extracting key semantic features in the instruction through supervised training to judge the attribution of user instruction business; Independent LoRA adapter training, based on a Qwen2.5 base model, utilizing a LLaMA-factor framework to perform independent LoRA fine tuning on three sub-corpora of working condition diagnosis, oil and gas reservoir geology and engineering technology respectively, and training to obtain three independent LoRA adapter weights; And carrying out dynamic weight combination reasoning, textCNN mapping the user input instruction into an intention vector of a corresponding business category, and dynamically adjusting the contribution degree of knowledge in different fields in reasoning by taking the model output weight as the linear combination of the basic weight and the increment of the weighting LoRA.
  5. 5. The method of claim 4, wherein the TextCNN network training data is a fine-tuning instruction data set constructed from about 1500 typical promt instructions randomly extracted from three sub-corpora of condition diagnosis, reservoir geology and oil recovery engineering to form a standardized training sample pair.
  6. 6. The method for constructing a large oil and gas production model platform according to claim 1, wherein the professional small model comprises a numerical simulation small model, a dynamic productivity prediction small model, a working condition diagnosis small model and an equipment fault early warning small model, and the collaborative workflow realizes the toolization collaboration of the large model and the professional small model.

Description

Construction method of large oil gas production model platform Technical Field The invention relates to the technical field of oil and gas production intellectualization, in particular to a construction method of a large oil and gas production model platform. Background Currently, the global energy industry is in the key period of transition from digital transformation to intelligent transformation, oil gas production is taken as a typical process industry, and various challenges such as complex geological conditions, change of mining environment, heterogeneous multi-source data and the like are faced, so that extremely high requirements are put on the accuracy and timeliness of production decisions. With the deep development of the industrial Internet, the oil and gas field accumulates massive production data and technical documents, how to activate the data assets, converts the data assets into explicit knowledge for guiding production, and becomes a core proposition for realizing cost reduction and efficiency enhancement and high-quality development in the oil and gas industry. In recent years, large Language Model (LLM) technology based on a transducer architecture makes breakthrough progress, and provides a brand new technical paradigm for industrial intelligent development. However, the direct application of the general large model in the oil and gas production field still faces remarkable 'water and soil shortage', namely, the general model lacks depth knowledge accumulation in the vertical field, faces the fact that oil and gas professional corpus is easy to generate actual errors or 'machine illusion', secondly, oil and gas production scene data has high multi-modal characteristics, the existing model is difficult to realize semantic alignment and fusion analysis of cross-modal data, thirdly, the interpretation and safety requirements of industrial scenes on decisions are extremely high, and a pure data driven model is difficult to meet the severe standard of engineering sites on physical mechanism constraint. Meanwhile, in the oil-gas vertical field, although a large model ecological system is built by some enterprises, the large model has multiple emphasis on the generalization capability of the whole industrial chain, lacks the deep adaptation of the multi-factor coupling relation of geological, engineering and production in the production link, has the problems of difficult light deployment and engineering closed-loop landing, and the like, and is difficult to support accurate production decision. Therefore, a method for constructing a special large model for adapting to the requirements of the core scenario of the oil and gas production is needed, and the technical pain is solved in a targeted manner. Disclosure of Invention Aiming at the problems in the related art, the invention provides a construction method of a large oil gas production model platform, which aims to overcome the technical problems existing in the prior related art. The technical scheme of the invention is realized as follows: the construction method of the oil gas production large model platform comprises the following steps: Pre-constructing a knowledge graph in the oil and gas field, automatically extracting entities and attributes of massive unstructured engineering documents in oil and gas production by adopting a BERT-BiLSTM-CRF model, and using an extraction result for filling and perfecting the knowledge graph; Based on Qwen2.5-7B and Qwen2.5-VL-7B double bases, performing efficient fine adjustment of parameters by adopting a dynamic weight mixing LoRA technology based on intention recognition, and respectively constructing a WellGPT model focusing on professional knowledge reasoning and a WellGPT-VL model focusing on vision fusion understanding; The method comprises the steps of constructing a five-layer technical architecture comprising a data layer, a platform layer, a model layer, a tool layer and a workflow layer, wherein all layers work cooperatively to realize full-link operation from data support to model cooperation and then to scene application, the data layer uses a knowledge graph as a core to construct a data base to realize multi-source data fusion, the platform layer uses an open source tool as a technical support, relies on a Neo4j database and LLaMA-Factory, python to construct a basic technical framework, the model layer carries out scene fine tuning on Qwen series models, takes the fine-tuned large model as a summary scheduling center, the tool layer builds a plurality of field small models in the form of FastAPI to serve as tool reserves, and the field small models comprise a numerical simulation small model, a dynamic productivity prediction small model and an equipment fault early warning small model; And arranging cooperative workflow of the large model and the professional small model through a Dify platform, and adapting to different oil gas production service scenes to form a customized app