CN-122021897-A - Collaborative reasoning method of double models and electronic equipment
Abstract
The application discloses a collaborative reasoning method of double models and electronic equipment, the collaborative reasoning method of the double models comprises the steps of utilizing a target model to conduct reasoning based on a first word element vector, outputting first hidden layer features and second hidden layer features in the reasoning process of the target model, enabling the association range of semantic information represented by the first hidden layer features to be smaller than that of semantic information represented by the second hidden layer features, utilizing the target draft model to conduct iterative reasoning for preset times based on the second word element vector to output a plurality of candidate word element once, and enabling the second word element vector to at least comprise the first hidden layer features, the second hidden layer features and at least part of the first word element vector, wherein the target model conducts parallel verification on the plurality of candidate word element once output by the target draft model.
Inventors
- PENG WEIYU
Assignees
- 联想(北京)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (10)
- 1. A collaborative reasoning method of a double model, the method comprising: the method comprises the steps of utilizing a target model to carry out reasoning based on a first word element vector, wherein in the reasoning process of the target model, a first hidden layer feature and a second hidden layer feature are output, and the semantic information association range of the first hidden layer feature representation is smaller than that of the second hidden layer feature representation; And carrying out iterative reasoning for preset times based on a second word element vector by utilizing a target draft model to output a plurality of candidate word element once, wherein the second word element vector at least comprises the first hidden layer feature, the second hidden layer feature and at least part of the first word element vector, and the target model carries out parallel verification on the plurality of candidate word element once output by the target draft model.
- 2. The method of claim 1, the object model further comprising an encoding layer and a language model header layer; before the reasoning is performed based on the first word element vector by using the target model, the method further comprises: Encoding the input lemma element based on the encoding layer to obtain the first lemma vector; the method further comprises the steps of: And carrying out probability prediction on the second hidden layer characteristic based on the language model head layer, and outputting a first candidate word element.
- 3. The method of claim 1, the object model comprising a plurality of hidden layers; in the reasoning process of the target model, outputting a first hidden layer feature and a second hidden layer feature, including: Reasoning the first word element vector based on a first hidden layer in the plurality of hidden layers to obtain the first hidden layer characteristic; and reasoning the last hidden layer characteristic output by the last hidden layer based on the last hidden layer in the plurality of hidden layers to obtain the second hidden layer characteristic.
- 4. A method according to claim 3, The target draft model comprises a first draft hidden layer and a second draft hidden layer, wherein the second draft hidden layer multiplexes the last hidden layer in the plurality of hidden layers.
- 5. The method according to claim 4, wherein the method comprises, The target model is obtained by training an initial model based on a first group of training texts; the target draft model is obtained by training the initial draft model based on a second group of training texts; The second draft hidden layer in the initial draft model is obtained by freezing the weight parameter of the last hidden layer in the target model.
- 6. The method of claim 1, wherein the performing, with the target draft model, iterative reasoning a preset number of times based on the second voxel vector to output a plurality of candidate voxel elements once, comprises: Taking the first hidden layer feature, the second hidden layer feature, at least part of the first coding feature corresponding to the first word element vector and the first candidate word element as an initial second word element vector, and utilizing the draft model to carry out reasoning so as to output a corresponding third hidden layer feature and a plurality of second candidate word element; updating the initial second voxel vector by the third hidden layer feature and second coding features corresponding to each of a plurality of second candidate voxel elements respectively to determine a plurality of updated second voxel vectors; utilizing the target draft model to respectively perform parallel reasoning based on the plurality of updated second word element vectors and outputting a plurality of corresponding third hidden layer features and a plurality of third candidate word element; And iterating the steps of determining a plurality of updated second word element vectors and carrying out parallel reasoning by using the target draft model until the iteration reaches the preset times to obtain a plurality of candidate word element.
- 7. The method of claim 6, the target draft model further comprising a full join layer and a language model header layer; And performing parallel reasoning based on the plurality of updated second word element vectors by using the target draft model, and outputting a plurality of corresponding third hidden layer features and a plurality of third candidate word element elements, wherein the method comprises the following steps: The plurality of updated second word element vectors are fused in parallel based on the full-connection layer of the target draft model, so that a plurality of corresponding fusion feature vectors are obtained; obtaining a plurality of corresponding third hidden layer features based on the first draft hidden layer, the second draft hidden layer and the plurality of fusion feature vectors; and carrying out parallel reasoning on the characteristic processes of the plurality of third hidden layers based on the language model head layer to obtain a plurality of third candidate word element elements.
- 8. The method of claim 7, the third hidden layer features comprising shallow hidden layer features and deep hidden layer features; the obtaining a plurality of corresponding third hidden layer features based on the first and second hidden layers and the plurality of fusion feature vectors includes: reasoning the fusion feature vector based on a first draft hiding layer to obtain features of the shallow hiding layer; And reasoning the shallow hidden layer characteristics based on a second draft hidden layer to obtain the deep hidden layer characteristics.
- 9. An electronic device, comprising: At least one memory configured to store a target model and a target draft model; At least one processor configured to: the method comprises the steps of utilizing a target model to carry out reasoning based on a first word element vector, wherein in the reasoning process of the target model, a first hidden layer feature and a second hidden layer feature are output, and the semantic information association range of the first hidden layer feature representation is smaller than that of the second hidden layer feature representation; And carrying out iterative reasoning for preset times based on a second word element vector by utilizing a target draft model to output a plurality of candidate word element once, wherein the second word element vector at least comprises the first hidden layer feature, the second hidden layer feature and at least part of the first word element vector, and the target model carries out parallel verification on the plurality of candidate word element once output by the target draft model.
- 10. An electronic device according to claim 9, The object model includes a plurality of hidden layers; the draft model comprises a first draft hiding layer and a second draft hiding layer; Wherein the second draft hidden layer multiplexes a last hidden layer of the plurality of hidden layers.
Description
Collaborative reasoning method of double models and electronic equipment Technical Field The application relates to the technical field of artificial intelligence, in particular to a collaborative reasoning method of double models and electronic equipment. Background In the field of natural language processing, EAGLET model is an inference acceleration framework implemented based on a large language model (Large Language Model, LLM), EAGLET adopts an autoregressive mode to predict a plurality of word elements (token) by using a draft model, and inputs a target model in parallel for verification. However, the prediction accuracy of the draft model is limited, so that the quality of a plurality of token predicted by the draft model is not high, and the verification burden of the target model is increased. Disclosure of Invention The application mainly provides a collaborative reasoning method of a double model and electronic equipment. The embodiment of the application provides a collaborative reasoning method of a double model, which comprises the steps of utilizing a target model to conduct reasoning based on a first word element vector, outputting a first hidden layer feature and a second hidden layer feature in the reasoning process of the target model, enabling the association range of semantic information represented by the first hidden layer feature to be smaller than that of semantic information represented by the second hidden layer feature, utilizing the target draft model to conduct iterative reasoning for preset times based on the second word element vector to output a plurality of candidate word element once, enabling the second word element vector to at least comprise the first hidden layer feature, the second hidden layer feature and at least part of the first word element vector, and enabling the target model to conduct parallel verification on the plurality of candidate word element once output by the target draft model. The embodiment of the application provides electronic equipment, which comprises at least one memory, at least one processor and a target draft model, wherein the memory is configured to store the target model and the target draft model, the processor is configured to utilize the target model to conduct reasoning based on a first word element vector, the first hidden layer feature and the second hidden layer feature are output in the reasoning process of the target model, the relation range of semantic information represented by the first hidden layer feature is smaller than that of semantic information represented by the second hidden layer feature, the target draft model is utilized to conduct iterative reasoning for preset times based on the second word element vector to output a plurality of candidate word element once, the second word element vector at least comprises the first hidden layer feature, the second hidden layer feature and at least part of the first word element vector, and the target model is used for conducting parallel verification on the plurality of candidate word element output by the target draft model once. Drawings FIG. 1 is a schematic diagram of an inference flow provided in an embodiment of the present application; fig. 2 is a schematic flow chart of a collaborative reasoning method of a dual model according to an embodiment of the present application; FIG. 3 is a second flow chart of a collaborative reasoning method of a dual model according to an embodiment of the present application; fig. 4 is a schematic flow chart III of a collaborative reasoning method of a double model according to an embodiment of the present application; FIG. 5 is a schematic diagram of a training process of an initial draft model according to an embodiment of the present application; fig. 6 is a flow chart diagram of a collaborative reasoning method of a dual model according to an embodiment of the present application; FIG. 7 is a schematic diagram of a target draft model output candidate voxel element according to an embodiment of the present application; fig. 8 is a flow chart diagram of a collaborative reasoning method of a double model according to an embodiment of the present application; fig. 9 is a flow chart diagram of a collaborative reasoning method of a dual model according to an embodiment of the present application; Fig. 10 is a schematic diagram of a composition structure of an electronic device according to an embodiment of the present application. Detailed Description The technical solution of the present application will be further elaborated with reference to the accompanying drawings and examples, which should not be construed as limiting the application, but all other embodiments which can be obtained by one skilled in the art without making inventive efforts are within the scope of protection of the present application. In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is