CN-121998096-A - Model training method, device and storage medium for multi-mode large model

CN121998096ACN 121998096 ACN121998096 ACN 121998096ACN-121998096-A

Abstract

The embodiment of the application provides a model training method, device and storage medium of a multi-modal large model, wherein the method comprises the steps of retrieving domain knowledge content corresponding to each interaction instruction in an interaction instruction set from a target knowledge base under the condition that the multi-modal large model is trained by using the interaction instruction set, fusing the domain knowledge content corresponding to each interaction instruction with initial prompt content of each interaction instruction to obtain fused prompt content of each interaction instruction, and carrying out model training on the multi-modal large model by using the fused prompt content of each interaction instruction and expected output content of each interaction instruction to obtain a trained modal large model. The application solves the problem of insufficient professionality in the construction safety scene of the construction safety large model in the related technology, thereby achieving the effect of improving the professionality of the large model.

Inventors

Cui Chanjie
REN YUPENG
ZHOU HE
FU JIANHAI
GAO YUSHUANG
LI QIANKUN

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. A method for model training of a multimodal mass model, comprising: Retrieving domain knowledge content corresponding to each interaction instruction in an interaction instruction set from a target knowledge base under the condition of model training of a multi-mode large model by using the interaction instruction set, wherein the target knowledge base is constructed based on a domain knowledge graph, the domain knowledge graph is used for recording domain knowledge content in the physical engineering domain, and each interaction instruction comprises initial prompt content and expected output content; Fusing the domain knowledge content corresponding to each interaction instruction with the initial prompt content of each interaction instruction to obtain the fused prompt content of each interaction instruction; and carrying out model training on the multi-modal large model by using the fusion prompt content of each interaction instruction and the expected output content of each interaction instruction to obtain the trained modal large model.
2. The method of claim 1, wherein prior to said retrieving domain knowledge content from the target knowledge base corresponding to each of the interaction instructions in the interaction instruction set, the method further comprises: The domain knowledge graph is constructed based on domain knowledge source data of the physical engineering domain, wherein the domain knowledge source data is data which is acquired from a domain knowledge source and is used for describing domain knowledge of the physical engineering domain, and the domain knowledge source data comprises at least one of at least one level of standard text, standard design graphic, abnormal state record data and experience description data.
3. The method according to claim 2, wherein the constructing the domain knowledge graph based on domain knowledge source data of the physical engineering domain includes: Extracting a domain knowledge triplet set from the domain knowledge source data, wherein each domain knowledge triplet in the domain knowledge triplet set is a triplet taking an abnormal state type, a state reference item and a state correction scheme as an entity; And constructing a knowledge graph by taking each entity in each domain knowledge triplet as a node to obtain the domain knowledge graph.
4. The method of claim 1, wherein prior to fusing the domain knowledge content corresponding to each of the interactive instructions with the initial prompt content for each of the interactive instructions, the method further comprises: Using each operation scene image and the step prompt template in the operation scene image set to interact with the interactive language model to obtain a multi-round interaction instruction set; generating an interaction instruction set based on the multi-round interaction instruction set, wherein each interaction instruction is at least part of one multi-round interaction instruction in the multi-round interaction instruction set; Wherein each job scene image is a real job scene image of an engineering job of one of the physical engineering fields or a composite job scene image of an engineering job of one of the physical engineering fields; The step prompt template is used for interacting with the interactive language model step by step according to a target prompt chain, and the prompt nodes in the target prompt chain comprise at least one of operation scene description information, abnormal state position indication information, abnormal state types, abnormal grades, state reference items and a state correction scheme.
5. The method of claim 4, wherein prior to said generating said set of interaction instructions based on said set of multiple rounds of interaction instructions, said method further comprises: Removing the multi-round interaction instruction set from the multi-round interaction instruction set, wherein the multi-round interaction instruction meets the screening condition of at least one of the following, and obtaining the updated multi-round interaction instruction set: the contained interactive content is not matched with the prompt nodes in the target prompt chain; the semantic similarity between the included state correction scheme and the corresponding state reference item is lower than a similarity threshold; Sampling in the sampling auditing process, and judging that the sampling auditing result is not passed.
6. The method of claim 1, wherein retrieving domain knowledge content from a target knowledge base corresponding to each of the set of interaction instructions comprises: According to the sequence that the vector similarity between the coding vector of each domain knowledge content in the target knowledge base and the coding vector of the initial prompt content of each interaction instruction is from high to low, screening M domain knowledge contents for each interaction instruction from the target knowledge base to obtain domain knowledge content corresponding to each interaction instruction; the coding vector of each domain knowledge content is a vector obtained by coding each domain knowledge content, the coding vector of the initial prompt content of each interaction instruction is a vector obtained by coding the initial prompt content of each interaction instruction, and M is a positive integer greater than or equal to 2; Among the M domain knowledge contents screened for each interaction instruction, the 1 st to N-th domain knowledge contents are domain knowledge contents used in a first model training stage of the multi-mode large model, the (n+1) -th domain knowledge contents are domain knowledge contents used in a second model training stage of the multi-mode large model, N is a positive integer greater than or equal to 1 and less than M, and the first model training stage is earlier than the second model training stage.
7. The method according to any one of claims 1 to 6, wherein the fusing the domain knowledge content corresponding to each interaction instruction and the initial prompt content of each interaction instruction to obtain the fused prompt content of each interaction instruction includes: Adding target prompt content and domain knowledge content corresponding to each interaction instruction into the initial prompt content of each interaction instruction to obtain fusion prompt content corresponding to each interaction instruction, wherein the target prompt content is used for prompting the multi-mode big model to generate output content corresponding to the initial prompt content of each interaction instruction according to the domain knowledge content corresponding to each interaction instruction.
8. A model training device for a multimodal mass model, comprising: the system comprises a retrieval unit, a target knowledge base and a target information processing unit, wherein the retrieval unit is used for retrieving domain knowledge content corresponding to each interaction instruction in an interaction instruction set from the target knowledge base under the condition that the interaction instruction set is used for carrying out model training on a multi-mode large model, the target knowledge base is constructed based on a domain knowledge graph, the domain knowledge graph is used for recording domain knowledge content in the physical engineering domain, and each interaction instruction comprises initial prompt content and expected output content; The fusion unit is used for fusing the domain knowledge content corresponding to each interaction instruction with the initial prompt content of each interaction instruction to obtain the fused prompt content of each interaction instruction; And the training unit is used for carrying out model training on the multi-modal large model by using the fusion prompt content of each interaction instruction and the expected output content of each interaction instruction to obtain the trained modal large model.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of any of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

Description

Model training method, device and storage medium for multi-mode large model Technical Field The embodiment of the application relates to the technical field of computers, in particular to a model training method, a device and a storage medium for a multi-mode large model. Background Along with the continuous promotion of the urban process and the continuous investment of infrastructure construction, the scale of the building industry is continuously enlarged. However, with the frequent development of construction activities, the construction safety situation is still severe. The traditional manual inspection mode has the prominent problems of low efficiency, strong subjectivity, incomplete coverage, delayed response and the like, and is difficult to meet the requirements of the modern intelligent construction site on fine and intelligent management. In the related art, the building site intelligent detection system based on the artificial intelligence large model can realize automatic identification, positioning and early warning of common safety risks and can improve the safety hidden danger discovery efficiency and the management closed-loop speed, but the system generally adopts a general multi-mode large model, and the general large model has the problems of insufficient understanding of industry specifications, low safety risk positioning precision and illusion of generating comments in a professional engineering scene. Therefore, the construction safety large model in the related art has the problem of insufficient professional performance in a construction safety scene. Disclosure of Invention The embodiment of the application provides a model training method, device and storage medium for a multi-mode large model, which at least solve the problem of insufficient professional performance of a construction safety large model in the related technology under a construction safety scene. According to one aspect of the embodiment of the application, a model training method of a multi-modal large model is provided, and the model training method comprises the steps of retrieving domain knowledge content corresponding to each interaction instruction in an interaction instruction set from a target knowledge base under the condition that the multi-modal large model is trained by using the interaction instruction set, wherein the target knowledge base is constructed based on a domain knowledge graph, the domain knowledge graph is used for recording domain knowledge content in the physical engineering domain, each interaction instruction comprises initial prompt content and expected output content, fusing the domain knowledge content corresponding to each interaction instruction with the initial prompt content of each interaction instruction to obtain fused prompt content of each interaction instruction, and performing model training on the multi-modal large model by using the fused prompt content of each interaction instruction and the expected output content of each interaction instruction to obtain the trained modal large model. According to another aspect of the embodiment of the application, a model training device of a multi-modal large model is provided, which comprises a searching unit and a training unit, wherein the searching unit is used for searching domain knowledge content corresponding to each interaction instruction in an interaction instruction set from a target knowledge base under the condition that the multi-modal large model is trained by using the interaction instruction set, the target knowledge base is constructed based on a domain knowledge graph, the domain knowledge graph is used for recording domain knowledge content of the physical engineering domain, each interaction instruction comprises initial prompt content and expected output content, the fusion unit is used for fusing the domain knowledge content corresponding to each interaction instruction and the initial prompt content of each interaction instruction to obtain fused prompt content of each interaction instruction, and the training unit is used for performing model training on the multi-modal large model by using the fused prompt content of each interaction instruction and the expected output content of each interaction instruction to obtain the trained large model. In an exemplary embodiment, the device further comprises a construction unit for constructing the domain knowledge graph based on domain knowledge source data of the physical engineering domain, wherein the domain knowledge source data is data acquired from a domain knowledge source and used for describing domain knowledge of the physical engineering domain, and the domain knowledge source data comprises at least one of at least one level of standard text, standard design illustration, abnormal state record data and experience description data, before the domain knowledge content corresponding to each interaction instruction in the interaction instruction set is r