CN-116862017-B - Data processing method, device, equipment and medium
Abstract
The invention relates to the technical fields of artificial intelligence technology, natural language processing and medical health, and discloses a data processing method, a device, equipment and a medium, wherein the method comprises the following steps of inputting original training data into an initial first network model of an initial network model for operation so as to obtain a first output result; determining a first comparison loss according to the first output result, inputting the first output result into the initial second network model to operate so as to obtain a second output result, determining a second comparison loss according to the second output result, and performing joint training on the initial network model according to the first comparison loss and the second comparison loss so as to obtain a target network model. The generation effect and the accuracy of the large language model are improved.
Inventors
- WANG JUN
- HOU CHANGYU
Assignees
- 平安科技(深圳)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230630
Claims (10)
- 1. A data processing method, wherein the data processing method is applied to an initial network model, the initial network model comprising an initial first network model and an initial second network model, the initial second network model comprising a large language model, the method comprising: the method comprises the steps of inputting original training data into an initial first network model of an initial network model to operate so as to obtain a first output result, wherein the first network model is a disease diagnosis small model which is used for receiving disease information of a patient as input and outputting probability distribution of the disease information; determining a first contrast loss according to the first output result; Inputting the first output result into the initial second network model for operation to obtain a second output result; Determining a second contrast loss according to the second output result; And performing joint training on the initial network model according to the first contrast loss and the second contrast loss to obtain a target network model.
- 2. The method of claim 1, wherein determining a first contrast loss based on the first output result comprises: Determining a first contrast loss according to the first output result by a method shown in the following formula: , Wherein, the In order to minimize the first contrast loss, For a first loss function of the initial first network model, Is a training data set of an initial first network model, The output probability distribution of the initial first network model is that y is a first output result, x is original training data, I is absolute value operation, log is logarithm operation, and M represents the initial first network model.
- 3. The data processing method according to claim 2, wherein the determining a second contrast loss from the second output result includes: acquiring type information of the initial second network model; Determining a second distribution function corresponding to the initial second network model according to the type information; determining a second probability distribution corresponding to the second output result through a second distribution function; acquiring a second probability tag and a second loss function of the second probability distribution; And determining the second contrast loss according to the second probability distribution, the second probability label and the second loss function.
- 4. A data processing method according to claim 3, wherein said jointly training said initial network model based on said first contrast loss and said second contrast loss to obtain a target network model comprises: Acquiring a balance coefficient of the joint training; constructing a first joint loss function according to the balance coefficient, the first loss function of the initial first network model and the second loss function of the initial second network model; Determining a joint contrast loss of the initial network model according to the first contrast loss, the second contrast loss and the first joint loss function; And adjusting weight parameters of an initial first network model and an initial second network model in the initial network model according to the first training data of the initial network model and the joint comparison loss, and obtaining the target network model after the joint comparison loss converges.
- 5. The data processing method of claim 4, wherein the constructing a first joint loss function from the balance coefficient, the first loss function of the initial first network model, and the second loss function of the initial second network model comprises: Constructing a first joint loss function according to the balance coefficient, the first loss function of the initial first network model and the second loss function of the initial second network model by a method shown by the following formula: H= , Wherein H is the first joint loss function, As a function of the first loss, As a function of the second loss, In order for the coefficient of balance to be present, Is a minimum operation.
- 6. The data processing method according to any one of claims 1-5, wherein after the initial network model is jointly trained from the first contrast loss and the second contrast loss to obtain a target network model, the method further comprises: Receiving condition information of a target user; and inputting the disease information into the target network model for operation so as to obtain medical report information corresponding to the disease information.
- 7. The data processing method according to claim 6, wherein after said inputting the condition information into the target network model for operation to obtain a medical report corresponding to the condition information, the method further comprises: extracting keywords from the medical report information to obtain a first keyword set; Carrying out semantic analysis on each first keyword in the first keyword set to obtain first semantic information corresponding to each first keyword; determining a risk type indication value corresponding to the target user according to the first semantic information corresponding to each first keyword; if the risk type indication value is higher than a preset risk type indication value, determining risk alarm information; And displaying the risk alarm information.
- 8. A data processing apparatus, the data processing apparatus being applied to an initial network model, the initial network model comprising an initial first network model and an initial second network model, the initial second network model comprising a large language model, the data processing apparatus comprising: the system comprises a data input unit, a first network model, a second network model, a disease diagnosis small model and a data processing unit, wherein the data input unit is used for inputting original training data into an initial first network model of the initial network model for operation so as to obtain a first output result; A first determining unit, configured to determine a first contrast loss according to the first output result; the operation unit is used for inputting the first output result into the initial second network model to perform operation so as to obtain a second output result; A second determining unit, configured to determine a second contrast loss according to the second output result; and the training unit is used for carrying out joint training on the initial network model according to the first contrast loss and the second contrast loss so as to obtain a target network model.
- 9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data processing method according to any of claims 1 to 7 when the computer program is executed.
- 10. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the data processing method according to any one of claims 1 to 7.
Description
Data processing method, device, equipment and medium Technical Field The present invention relates to the field of artificial intelligence technology, natural language processing, and medical health technology, and in particular, to a data processing method, apparatus, device, and medium. Background The advent of the mobile internet era, the use of large language models (Large Language Model, LLM), which refers to deep neural network models such as GPT-3, BERT, etc. capable of processing massive text data and generating natural language text, has become increasingly widespread. LLM has strong universality and generalization capability, and can be applied to various natural language processing tasks, such as text generation, text abstract, question-answering system and the like. Text prompt (Text Prompt) is currently commonly used, i.e., text that provides some specific format or content to the LLM as a guide for input or output, in hopes that the LLM can generate or answer in the desired manner. For example, in terms of medical health, the LLM may be provided with medical data of a user, the LLM being adjusted according to the medical data, ultimately enabling the adjusted LMM to generate medical report information or the like based on the medical data. However, since text template often needs to be designed and adjusted manually, since text template often only provides some surface layer or local information, knowledge structure and characteristics of the target field cannot be fully reflected, and thus the generation effect and accuracy of the large language model are not high. Disclosure of Invention The invention provides a data processing method, a device, equipment and a medium, which are used for solving the technical problems of low generation effect and low accuracy of a large language model. In a first aspect, a data processing method is provided, the data processing method being applied to an initial network model, the initial network model including an initial first network model and an initial second network model, the initial second network model including a large language model, the method comprising: inputting the original training data into an initial network model, wherein the initial network model comprises an initial first network model and an initial second network model; outputting a first output result based on the initial first network model, and obtaining a first comparison loss according to the first output result; outputting a second output result through the initial second network model based on the first output result, and obtaining a second comparison loss according to the second output result; And performing joint training on the initial network model according to the first contrast loss and the second contrast loss to obtain a target network model. In a second aspect, there is provided a data processing apparatus applied to an initial network model including an initial first network model and an initial second network model including a large language model, the data processing apparatus comprising: the data input unit is used for inputting the original training data into an initial network model, wherein the initial network model comprises an initial first network model and an initial second network model; The first loss unit is used for outputting a first output result based on the initial first network model and obtaining a first comparison loss according to the first output result; the second loss unit is used for outputting a second output result through the initial second network model based on the first output result, and obtaining a second comparison loss according to the second output result; And the model training unit is used for carrying out joint training on the initial network model according to the first contrast loss and the second contrast loss to obtain a target network model. In a third aspect, a computer device is provided comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the data processing method described above when executing the computer program. In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the data processing method described above. In the scheme realized by the data processing method, the device, the equipment and the medium, the original training data is input into the initial first network model of the initial network model to be operated so as to obtain a first output result, the first comparison loss is determined according to the first output result, the first output result is input into the initial second network model to be operated so as to obtain a second output result, the second comparison loss is determined according to the second output result, and the initial network model is jointly trained ac