CN-122019752-A - Multi-text data processing method for fine tuning, fine tuning method and electronic equipment

CN122019752ACN 122019752 ACN122019752 ACN 122019752ACN-122019752-A

Abstract

The disclosure provides a multi-text data processing method for fine tuning, a fine tuning method and electronic equipment, relates to the field of artificial intelligent model training, and particularly relates to the field of text searching and computing power resource optimization. The method comprises the steps of obtaining a prompt word and a plurality of text data, wherein the text data are ordered according to a preset mode or randomly ordered, splicing the prompt word and the text data, adding a preset symbol for the prompt word during splicing, adding the preset symbol at the tail of the text data, adding the preset symbol between at least two adjacent text data, and performing fine adjustment on the data obtained after splicing processing to generate a text search model. By utilizing the method and the device, the computing resources can be saved.

Inventors

SONG JIAWEN
LI HUI
XING JUNWEI
ZHANG BINGFEI
GAO SAI
ZHANG XUEYAO

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260123

Claims (15)

1. A method of multi-text data processing for fine tuning, comprising: Acquiring a prompt word and a plurality of text data, wherein the text data are ordered or randomly ordered according to a preset mode; And splicing the prompt words and the plurality of text data, adding a preset symbol for the prompt words during splicing, adding the preset symbol at the tail of the plurality of text data, adding the preset symbol between at least two adjacent text data, and performing fine tuning on the large model by the data obtained after splicing processing to generate a text search model.
2. The method of claim 1, wherein the adding a preset symbol between at least two adjacent text data comprises adding a preset symbol between each two adjacent text data.
3. A method of large model fine tuning, comprising: Inputting the preprocessed prompt words and a plurality of text data into a large model, wherein the preprocessing is performed according to the method of claim 1 or 2; wherein the attention mechanism of the large model is set to conform to the causal attention mechanism and a single text focuses only on itself and the cue words when performing the attention calculation.
4. A method according to claim 3, wherein the attention mechanism of the large model is further arranged such that the alert word is only visible to itself and to the current text when performing the attention calculation.
5. The method according to claim 3 or 4, wherein the preset symbol adopts a custom special token; The method further comprises the step of extracting hidden layer characteristics of the input prompt words and the text data through the large model to obtain hidden layer characteristics of the token of the prompt words, hidden layer characteristics of the token of each text data and hidden layer characteristics of each specific token.
6. The method of claim 5, wherein the large model is coupled to a attention pooling module; The method further comprises the step of carrying out weighted fusion calculation on the hidden layer characteristics of the token of the prompt word, the hidden layer characteristics of the token of each text data and the hidden layer characteristics of each special token through the attention pooling module so as to obtain fusion characteristics of the prompt word and each text data.
7. The method of claim 6, wherein the plurality of text data includes first text data; The weighted fusion calculation comprises the steps that the attention pooling module carries out weight calculation on hidden layer characteristics of the token of the prompt word, hidden layer characteristics of the special token following the prompt word, hidden layer characteristics of the token of the first text data and hidden layer characteristics of the special token following the first text data, and carries out weighted summation calculation after normalization of the calculated weights to obtain fusion characteristics of the prompt word and the first text data.
8. The method according to claim 6 or 7, wherein hidden layer features of a token of the hint word are multiplexed and/or hidden layer features of a specific token following the hint word are multiplexed when calculating fusion features of the hint word and different text data.
9. The method of any of claims 6-8, further comprising: calculating the relevance scores between the prompt words and the text data based on the weighted fusion characteristics of the prompt words and the text data; sequencing each text data according to the corresponding relevance score; And adjusting parameters by using the sorting loss function, and stopping fine tuning training when the sorting output by the model and the sorting marked by the model meet the preset requirements, so as to obtain the text search model.
10. The method of any of claims 3-9, wherein the large model's attention mask is expressed as follows: Wherein M i,j represents an attention mask, Q represents a prompt word matrix, D represents a text matrix, an angle mark i indicates a token position where attention is being calculated, an angle mark j indicates a viewed token position, and angle marks M and k represent numbers of text data.
11. A multi-text data processing apparatus for fine tuning, comprising: the acquisition module is used for acquiring the prompt words and a plurality of text data, wherein the text data are ordered or randomly ordered according to a preset mode; The splicing module is used for splicing the prompt words and the text data, adding preset symbols for the prompt words during splicing, adding the preset symbols at the tail ends of the text data, adding the preset symbols between at least two adjacent text data, and finely adjusting the large model to generate a text search model by the data obtained after splicing.
12. A large model fine tuning device comprising: an input module for inputting a preprocessed prompt word and a plurality of text data into a large model, wherein the preprocessing is performed according to the method of claim 1 or 2; wherein the attention mechanism of the large model is set to conform to the causal attention mechanism and a single text focuses only on itself and the cue words when performing the attention calculation.
13. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10.

Description

Multi-text data processing method for fine tuning, fine tuning method and electronic equipment Technical Field The disclosure relates to the technical field of artificial intelligent model training, and further relates to the technical field of text searching and computing power resource optimization, in particular to a multi-text data processing method for fine tuning, a large model fine tuning method, a large model fine tuning device, electronic equipment and a program product. Background A text search model (or called a text retrieval model) can be obtained by fine tuning the pre-trained large model by using the marked prompt words and corresponding text data, but the conventional fine tuning has the problem of larger consumption of computing resources, and the known optimization means have poor effect. Disclosure of Invention The present disclosure provides a multi-text data processing method for fine tuning, a fine tuning method, an electronic device, a storage medium, and a program product. According to one aspect of the disclosure, a multi-text data processing method for fine tuning is provided, which comprises the steps of obtaining a prompt word and a plurality of text data, wherein the text data are ordered according to a preset mode or randomly ordered, splicing the prompt word and the text data, adding a preset symbol to the prompt word during splicing, adding the preset symbol at the tail end of the text data, adding the preset symbol between at least two adjacent text data, and fine tuning a large model by using the data obtained after splicing processing to generate a text search model. According to another aspect of the present disclosure, there is provided a large model fine tuning method comprising inputting a pre-processed cue word and a plurality of text data into a large model, wherein the pre-processing is performed according to the multi-text data processing method for fine tuning as described above, wherein an attention mechanism of the large model is set to conform to a causal attention mechanism and a single text focuses on itself and the cue word only when performing an attention calculation. According to another aspect of the disclosure, a multi-text data processing device for fine tuning is provided, which comprises an acquisition module, a splicing module and a splicing module, wherein the acquisition module is used for acquiring a prompt word and a plurality of text data, the text data are ordered or randomly ordered according to a preset mode, the splicing module is used for splicing the prompt word and the text data, the splicing module adds a preset symbol to the prompt word during splicing, the preset symbol is added at the tail of the text data, the preset symbol is added between at least two adjacent text data, and the data obtained after splicing processing are used for fine tuning a large model to generate a text search model. According to another aspect of the present disclosure, there is provided a large model fine tuning apparatus comprising an input module for inputting a pre-processed cue word and a plurality of text data into a large model, wherein the pre-processing is performed according to the multi-text data processing method for fine tuning as described above, wherein an attention mechanism of the large model is set to conform to a causal attention mechanism and a single text focuses on itself and the cue word only when performing an attention calculation. According to another aspect of the present disclosure, there is provided an electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a multi-text data processing method or a trimming method for trimming as described above. According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the multi-text data processing method or the trimming method for trimming as described above. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a multi-text data processing method or a trimming method for trimming as described above. The method and the system provide a brand-new data splicing mode allowing multi-text fusion processing, a brand-new splicing structure of prompt words and multi-text data can be constructed by utilizing the method and the system, the method and the system can be used for generating a text retrieval model by fine tuning a large model, and based on the splicing structure, the characteristics of a plurality of prompt words and text pairs can be extracted through single model reasoning, and repeated starting of characteristic extraction is not needed for the same