CN-117332087-B - Scheduling and training method, system, equipment and medium for large language model

CN117332087BCN 117332087 BCN117332087 BCN 117332087BCN-117332087-B

Abstract

A scheduling and training method, system, equipment and medium for a large language model relate to the field of artificial intelligence. The method comprises the steps of obtaining a vertical question, selecting at least one subdivision vertical model from a comprehensive vertical model according to the vertical question, wherein the comprehensive vertical model comprises a plurality of subdivision vertical models, each subdivision vertical model is a lightweight model, inputting the vertical question into the selected subdivision vertical model to obtain a plurality of first answer results output by the subdivision vertical model, and sorting and splicing the first answer results to obtain a second answer result corresponding to the vertical question to finish the solution of the vertical question. By adopting the technical scheme provided by the application, the dispatching of a plurality of lightweight subdivision vertical models is realized, so that the appropriate subdivision vertical model is selected to solve the vertical problem, and the training calculation force requirement and the representation capability of the question-answering model are well considered.

Inventors

GUO JUN
ZHU QING
WU PENG
LIU YANGYANG

Assignees

上海钢联电子商务股份有限公司

Dates

Publication Date: 20260505
Application Date: 20231011

Claims (7)

1. A method for scheduling and training a large language model, the method comprising the steps of: obtaining a sagging problem; selecting at least one subdivision vertical model from a comprehensive vertical model according to the vertical problem, wherein the comprehensive vertical model comprises a plurality of subdivision vertical models, and each subdivision vertical model is a lightweight model; inputting the drop questions into the selected sub-division drop model to obtain a plurality of first answer results output by the sub-division drop model; finishing and splicing the first answer results to obtain a second answer result corresponding to the vertical questions to complete the answering of the vertical questions, wherein, Before at least one subdivision vertical model is selected from the comprehensive vertical model according to the vertical problem, the method further comprises a training process for the comprehensive vertical model, and the training process specifically comprises the following steps: Acquiring vertical text data; performing text self-classification on the vertical text data to obtain a plurality of vertical text clusters; performing cyclic pre-training on the comprehensive vertical model through each vertical text cluster to generate the subdivision vertical model corresponding to each vertical text cluster; In text self-classifying the vertical text data to obtain a plurality of vertical text clusters, the method specifically comprises the following steps: Slicing the vertical text data to obtain a plurality of semantic slices; determining a plurality of similar slice sets from all the semantic slices, wherein the similar slice sets comprise a plurality of semantic slices; respectively calculating the set scores of the similar slice sets; selecting a plurality of similar slice sets with highest set scores as the vertical text clusters, and simultaneously determining the class center points of the vertical text clusters; selecting at least one subdivision vertical model from the comprehensive vertical model according to the vertical problem, wherein the subdivision vertical model specifically comprises the following steps: Vectorizing the vertical problem to obtain a vertical problem vector; Respectively calculating semantic similarity between the vertical problem vector and the similar center points of the similar slice sets corresponding to the subdivision vertical models; And selecting the subdivision vertical model with all semantic similarity larger than a second similarity threshold.
2. The method for scheduling and training a large language model according to claim 1, wherein determining a plurality of similar slice sets from all semantic slices specifically comprises: selecting any semantic slice from all the semantic slices; respectively calculating semantic similarity between the selected semantic slice and other semantic slices; and not replacing and taking out a plurality of other semantic slices with semantic similarity larger than a first similarity threshold value, and completing the construction of the similar slice set.
3. The method for scheduling and training large language models according to claim 1, wherein in calculating the set score of each of the similar slice sets, respectively, specifically comprising: Respectively calculating semantic similarity between any two semantic slices in the similar slice set; Calculating a first similarity average value and a first similarity standard deviation of the similar slice set according to the semantic similarity between any two semantic slices in the similar slice set; Normalizing the first similarity average value and the first similarity standard deviation to obtain a second similarity average value and a second similarity standard deviation; And processing the second similarity average value and the second similarity standard deviation through a preset aggregate score calculation formula, and calculating to obtain the aggregate score.
4. The method for scheduling and training a large language model according to claim 1, wherein the step of performing a sort and splice on each of the first answer results to obtain a second answer result corresponding to the vertical question specifically comprises: acquiring a vertical problem prompt word according to the vertical problem vector; And splicing the first answer results, and adding the vertical question prompt words before all the spliced texts to obtain the second answer results.
5. A dispatch and training system for a large language model, the system comprising: a user drop problem acquisition module (301) for acquiring a drop problem; A subdivision vertical model selection module (302) for selecting at least one subdivision vertical model from a comprehensive vertical model according to the vertical problem, wherein the comprehensive vertical model comprises a plurality of subdivision vertical models, each subdivision vertical model is a lightweight model, and the subdivision vertical model is a lightweight model, Before at least one subdivision vertical model is selected from the comprehensive vertical model according to the vertical problem, the method further comprises a training process for the comprehensive vertical model, and the training process specifically comprises the following steps: Acquiring vertical text data; performing text self-classification on the vertical text data to obtain a plurality of vertical text clusters; performing cyclic pre-training on the comprehensive vertical model through each vertical text cluster to generate the subdivision vertical model corresponding to each vertical text cluster; In text self-classifying the vertical text data to obtain a plurality of vertical text clusters, the method specifically comprises the following steps: Slicing the vertical text data to obtain a plurality of semantic slices; determining a plurality of similar slice sets from all the semantic slices, wherein the similar slice sets comprise a plurality of semantic slices; respectively calculating the set scores of the similar slice sets; selecting a plurality of similar slice sets with highest set scores as the vertical text clusters, and simultaneously determining the class center points of the vertical text clusters; selecting at least one subdivision vertical model from the comprehensive vertical model according to the vertical problem, wherein the subdivision vertical model specifically comprises the following steps: Vectorizing the vertical problem to obtain a vertical problem vector; Respectively calculating semantic similarity between the vertical problem vector and the similar center points of the similar slice sets corresponding to the subdivision vertical models; Selecting the subdivision vertical model with all semantic similarity larger than a second similarity threshold; a first answer result determining module (303) for inputting the drop questions into the sub-drop models selected to obtain a plurality of first answer results output by the sub-drop models; And the second answer result determining module (304) is used for carrying out arrangement and splicing on each first answer result to obtain a second answer result corresponding to the vertical questions, and finishing the solution of the vertical questions.
6. An electronic device comprising a processor (401), a memory (405), a user interface (403) and a network interface (404), the memory (405) being configured to store instructions, the user interface (403) and the network interface (404) being configured to communicate to other devices, the processor (401) being configured to execute the instructions stored in the memory (405) to cause the electronic device (400) to perform the method according to any of claims 1-4.
7. A computer-readable storage medium, wherein the computer-readable storage medium stores instructions that, when executed, performs the method steps of any of claims 1-4.

Description

Scheduling and training method, system, equipment and medium for large language model Technical Field The application relates to the field of artificial intelligence, in particular to a scheduling and training method, a system, equipment and a medium of a large language model. Background As the parameters of large language models increase, the large number of models may place higher demands on computing resources and training samples. In order to solve the problems existing in the models with large parameter, the models with large parameter are compressed mainly through the technologies of model compression, model pruning or model quantization, and the like, so that a lightweight model is obtained, and the consumption of the models with large parameter to computing resources is reduced. However, the lightweight model has a larger difference in representation capability compared with the model with a large parameter, so that in practical application, an appropriate model size needs to be selected according to specific requirements and scenes, and a trade-off is made between the resource requirement of the large parameter model and the representation capability of the small parameter model. Currently, in the process of drooping knowledge questions and answers, for drooping questions input by a user, the drooping questions are generally input into a single question-answer model, and answer results are returned to the user according to output results of the question-answer model. If the question-answering model adopts a model with large parameter quantity, although the representation capability of the question-answering model is improved, the question-answering model has higher requirement on computational resources when being trained, and if the question-answering model adopts a lightweight model, the burden of computational resource requirements when being trained is reduced, but the representation capability of the question-answering model is reduced. At present, a better compromise is difficult to be made between the training computational power requirement and the representation capability of the question-answering model. Disclosure of Invention In order to better consider the training calculation power requirement and the representation capability of the question-answering model, the application provides a scheduling and training method, a system, equipment and a medium of a large language model. In a first aspect, the present application provides a method for scheduling and training a large language model, the method comprising the steps of: obtaining a sagging problem; selecting at least one subdivision vertical model from a comprehensive vertical model according to the vertical problem, wherein the comprehensive vertical model comprises a plurality of subdivision vertical models, and each subdivision vertical model is a lightweight model; inputting the drop questions into the selected sub-division drop model to obtain a plurality of first answer results output by the sub-division drop model; And arranging and splicing the first answer results to obtain a second answer result corresponding to the vertical questions, and finishing the solution of the vertical questions. According to the technical scheme, aiming at the sagging problem input by a user, at least one subdivision sagging model is selected from the comprehensive sagging model based on the sagging problem to solve the sagging problem, the representation capacity of the model is guaranteed by calling a plurality of subdivision sagging models, meanwhile, the subdivision sagging models are lightweight models, the training calculation resource requirements of the lightweight models are far smaller than those of a large parameter model, and the burden of calculating resource requirements during training is reduced. The training calculation power requirement and the representation capability of the question-answering model are well considered. Optionally, before at least one subdivision vertical model is selected from the comprehensive vertical model according to the vertical problem, the training process of the comprehensive vertical model is further included, and specifically includes the following steps: Acquiring vertical text data; performing text self-classification on the vertical text data to obtain a plurality of vertical text clusters; And performing cyclic pre-training on the comprehensive vertical model through each vertical text cluster to generate the subdivision vertical model corresponding to each vertical text cluster. By adopting the technical scheme, the self-classification of the vertical text is completed, and in the process of classifying the text, a classification target is not set, but the text is automatically classified according to the self semantics of the vertical text, so that the vertical text clusters in different vertical fields are determined. The comprehensive vertical model is trained based on the automatically div