CN-122021867-A - Large language model training method and system capable of adapting to knowledge in vertical field

CN122021867ACN 122021867 ACN122021867 ACN 122021867ACN-122021867-A

Abstract

The invention discloses a large language model training method and system capable of adapting to knowledge in the vertical field, and belongs to the technical field of artificial intelligence and natural language processing. The method comprises the steps of 1, extracting formatted data and field high-frequency terms, abbreviations and special symbols, obtaining an expanded word segmentation device of field concept codes, simultaneously constructing a sample set, 2, constructing a main network, carrying out head-by-head weighting normalization fusion on field hidden vectors, obtaining output through a multi-layer neural network, 3, carrying out shallow freezing deep defrosting training and optimization on a field vertical network, adopting meta learning and cross-task migration mechanisms to improve task suitability of the field vertical network in an aeroengine fault maintenance scene, 4, inputting a test set into the trained field vertical network to obtain an evaluation index, and compared with the prior art, in the aeroengine fault maintenance scene, improving the efficiency of adapting to aeroengine knowledge in the aeroengine field.

Inventors

LI KAN
ZHANG YUEQI

Assignees

北京理工大学

Dates

Publication Date: 20260512
Application Date: 20251201

Claims (7)

1. A training method of a large language model capable of adapting to knowledge in the vertical field is characterized by comprising the following steps, Step 1, performing data preprocessing on multisource corpus to form formatted data, and extracting field high-frequency terms, abbreviations and special symbols and acquiring an expanded word segmentation device for field concept coding based on the formatted data; Step 2, constructing a model backbone network of a decoder only with a domain information processing unit, performing head-by-head weighted normalization fusion on a basic attention vector obtained by the backbone network through an attention mechanism and a domain hidden vector obtained by the domain information processing unit through a bottleneck layer, and obtaining output through a multi-layer neural network; Step 2.1, taking a Decoder-Only converter model as a backbone network; Step 2.2, accessing the domain information processing unit DIPU into a main network in a parallel connection mode to form a domain vertical network; step 2.3, the task sample set and the error path comparison sample set are simultaneously input into an expansion word segmentation device of the field concept code to obtain hidden vectors; Step 2.4, inputting the hidden vectors into a domain vertical network at the same time, converting the hidden vectors into basic attention vectors by a backbone network by using an attention mechanism, and generating domain hidden vectors in a query-key-value form by a domain information processing unit through a bottleneck layer; step 2.5, performing head-by-head weighting normalization fusion on the basic attention vector and the field hidden vector to form a fused attention vector; Step 2.6, obtaining output of the fusion attention vector through a multi-layer neural network; Step 3, carrying out shallow freezing deep thawing training on the field vertical network by utilizing a term sample and a multi-step logic reasoning, conditional branching and long context dependency relationship in sequence, and carrying out network optimization by adopting cross entropy loss; And 4, inputting the test set into a trained field vertical network to obtain evaluation accuracy, inference chain consistency and interpretability indexes.
2. The large language model training method capable of adapting to the knowledge in the vertical field of claim 1, wherein the implementation method of the step 1 is as follows, Step 1.1, collecting multisource corpus in the vertical field of a target, and performing data preprocessing to obtain formatted data for training; Extracting a domain high-frequency term, an abbreviation and a special symbol in formatted data by using a regular expression and an LLM model, and injecting a vocabulary term with incomplete segmentation or semantic segmentation error in the model into a vocabulary of a vocabulary segmenter to obtain an expanded vocabulary segmenter of domain concept coding; and 1.3, constructing a task sample set and an error path comparison sample set, and expanding the sample set by adopting a rule template and a generation type method.
3. The large language model training method capable of adapting to the knowledge in the vertical field of claim 2, wherein the implementation method of the step 1.3 is as follows, Step 1.3.1, when the data are sufficient, constructing a task sample set and an error path comparison sample set; And 1.3.2, when the data is insufficient, expanding the sample set by adopting a rule template and a generating method.
4. The large language model training method capable of adapting to the knowledge in the vertical field of claim 1, wherein the step 3 is realized by the following steps, Step 3.1, freezing a shallow layer Transformer Block in the domain vertical network and thawing a deep layer Transformer Block, a domain information processing unit and an output layer in the domain vertical network when training the domain vertical network; Step 3.2, training the domain vertical network by utilizing a term sample and a multi-step logic reasoning, conditional branching and long context dependency relationship in sequence, and further optimizing the domain vertical network by adopting cross entropy loss; And 3.3, improving the task suitability of the field vertical network in the field of the aeroengine fault maintenance scene by adopting meta learning and cross-task migration mechanisms.
5. The large language model training method capable of adapting to the knowledge in the vertical field of claim 1, wherein the step 3.1 is realized by the following steps, Step 3.1.1, carrying out partition management on parameters of the domain vertical network, taking Transformer Block of the front half part of the domain vertical network as a shallow layer, and taking Transformer Block of the rear half part of the domain vertical network as a deep layer; step 3.1.2, freezing the shallow Transformer Block in the domain vertical network; And 3.1.3, thawing the deep layer Transformer Block, the domain information processing unit and the output layer in the domain vertical network.
6. The large language model training system capable of adapting to the knowledge of the vertical domain for realizing the method as set forth in claim 1, wherein the large language model training system comprises a domain vertical network module and a training control module; The field vertical network module consists of a vocabulary expansion module, a field information processing module and a fusion attention module, and is used for constructing a model for detecting the fault of the aeroengine; the training control module is used for performing field vertical network semi-freezing training and performing field vertical network optimization by utilizing meta learning and migration.
7. The large language model training system adaptable to vertical domain knowledge of claim 6, wherein: The vocabulary expansion module is used for expanding the vocabulary and retraining the embedded layer to integrate the domain terms; The field information processing module is used for connecting bottleneck structures in parallel outside a transducer backbone and performing multi-subspace projection to generate a field enhancement representation; The fusion attention module is used for fusing the main attention result with the attention distribution of the field enhancement representation and obtaining a fusion attention vector.

Description

Large language model training method and system capable of adapting to knowledge in vertical field Technical Field The invention relates to a large language model training method and system capable of adapting to knowledge in the vertical field, belongs to the technical field of artificial intelligence and natural language processing, and is applied to an aeroengine fault maintenance scene. Background In recent years, large language models (Large Language Model, LLM) have achieved significant results in general natural language understanding, text generation, question-answering systems, intelligent searching, and the like. However, the general large language model is often limited in performance due to lack of targeted knowledge and reasoning patterns when facing a specific vertical domain, and mainly has the following problems: (1) The domain knowledge adaptation efficiency is low, when the specific domain knowledge is introduced, a large amount of marking data and repeated training are needed, the adaptation period is long, and the cost is high; (2) The complex task analysis capability is insufficient, when complex tasks such as multi-round reasoning, long-chain logic relationship, multi-mode data fusion and the like are involved, the stability, the interpretability and the execution precision of the existing model are insufficient; (3) The real-time response and the deployability are limited, and part of field application scenes have higher requirements on response delay, operation resources and edge deployment capability, and the existing large model is difficult to directly land under the conditions. Therefore, how to improve the efficiency of adapting to the knowledge of the engine in the aviation field has become a problem to be solved. Disclosure of Invention The invention aims to solve the technical problem of improving the efficiency of adapting to the knowledge of an aero-engine in an aero-engine fault maintenance scene, and provides a large language model training method and system capable of adapting to the knowledge of a vertical field. According to the invention, the field information processing units are connected in parallel outside the trunk of the large language model, and the means of fusion attention mechanism, semi-freezing training strategy, chain type reasoning enhancement, few sample learning and migration optimization, light weight distillation and quantization and the like are matched, so that the rapid adaptation of any vertical field is realized, and the large language model with high precision, interpretation, low time delay and floor type is obtained. The invention gives consideration to the requirements of the training stage and the deployment stage, and can stably operate under the condition of lower calculation force. The invention aims at realizing the following technical scheme: the invention discloses a large language model training method capable of adapting to knowledge in the vertical field, which comprises the following steps: Step 1, performing data preprocessing on multisource corpus to form formatted data, and extracting field high-frequency terms, abbreviations and special symbols and acquiring an expanded word segmentation device for field concept coding based on the formatted data; step 1.1, collecting multisource corpus in the vertical field of a target, and performing data preprocessing to obtain formatted data for training; Extracting a domain high-frequency term, an abbreviation and a special symbol in formatted data by using a regular expression and an LLM model, and injecting a vocabulary term with incomplete segmentation or semantic segmentation error in the model into a vocabulary of a vocabulary segmenter to obtain an expanded vocabulary segmenter of domain concept coding; Step 1.3, constructing a task sample set and an error path comparison sample set, and expanding the sample set by adopting a rule template and a generation type method; Step 1.3.1, when the data are sufficient, constructing a task sample set and an error path comparison sample set; step 1.3.2, when the data is insufficient, expanding a sample set by adopting a rule template and a generating method; Step 2, constructing a model backbone network of a decoder only with a domain information processing unit, performing head-by-head weighted normalization fusion on a basic attention vector obtained by the backbone network through an attention mechanism and a domain hidden vector obtained by the domain information processing unit through a bottleneck layer, and obtaining output through a multi-layer neural network; Step 2.1, taking a Decoder-Only converter model as a backbone network; Step 2.2, accessing the domain information processing unit DIPU into a main network in a parallel connection mode to form a domain vertical network; step 2.3, the task sample set and the error path comparison sample set are simultaneously input into an expansion word segmentation device of the field concept co