CN-116303927-B - Network model training method, event extraction method, device and storage medium
Abstract
The application discloses a network model training method, an event extraction method, equipment and a storage medium, wherein the model training method comprises the steps of obtaining first sample data from a training set corresponding to a target task and obtaining second sample data from a training sample corresponding to an auxiliary task; the method comprises the steps of training a first network model by using first sample data and training a second network model by using second sample data, carrying out gradient update on a plurality of shared network layers between the first network model and the second network model, carrying out gradient update on an unshared network layer of the first network model according to a target task, carrying out gradient update on an unshared network layer of the second network model according to an auxiliary task, wherein the shared network layer and the unshared network layer are determined in advance by using gradient similarity, and taking the first network model as a final network model. By the method, the first network model can learn knowledge in the second network model, and a final network model is obtained through training.
Inventors
- XU RUIFENG
- LI JIANGNAN
- GAO JUN
- LIANG BIN
Assignees
- 哈尔滨工业大学(深圳)
Dates
- Publication Date
- 20260505
- Application Date
- 20230106
Claims (6)
- 1. A method of event extraction, the method comprising: acquiring a text corresponding to an event extraction task; Inputting the text into a network model to obtain an event corresponding to the text, wherein the network model comprises a first network model and a second network model, and the network model is a final network model obtained through training by the following method: Acquiring first sample data from a training set corresponding to a target task, and acquiring second sample data from a training sample corresponding to an auxiliary task; Acquiring a first loss value corresponding to the first network model, and acquiring a second loss value corresponding to the second network model; determining a first gradient corresponding to each network layer in the first network model according to the first loss value, and determining a second gradient corresponding to each network layer in the second network model according to the second loss value; Determining gradient similarity between the first network model and the second network model network layer according to the first gradient and the second gradient; If the gradient similarity is greater than or equal to a threshold value, determining that the network layer corresponding to the gradient similarity is a shared network layer; if the gradient similarity is smaller than a threshold value, determining that a network layer corresponding to the gradient similarity is a non-shared network layer; carrying out gradient updating on a plurality of shared network layers between the first network model and the second network model according to a shared gradient, carrying out gradient updating on an unshared network layer of the first network model according to a target task, and carrying out gradient updating on an unshared network layer of the second network model according to an auxiliary task, wherein the shared network layer and the unshared network layer are determined in advance by utilizing gradient similarity; And taking the first network model as a final network model.
- 2. The method of claim 1, wherein the first sample data and the second sample data are a plurality; The determining a first gradient corresponding to each network layer in the first network model according to the first loss value includes: determining a first gradient corresponding to each network layer in the first network model according to each first sample data; accumulating all the first gradients to obtain a first accumulated gradient; the determining a second gradient corresponding to each network layer in the second network model according to the second loss value includes: Determining a second gradient corresponding to each network layer in the second network model according to each second sample data; And accumulating all the second gradients to obtain a second accumulated gradient.
- 3. The method of claim 2, wherein the determining the gradient similarity between the first network model and the second network model network layer from the first gradient and the second gradient comprises: and determining gradient similarity between the first network model and the second network model network layer according to the first accumulated gradient and the second accumulated gradient.
- 4. The method of claim 1, wherein the first network model and the second network model share an embedded layer.
- 5. An electronic device comprising a memory for storing a computer program and a processor for executing the computer program to implement the method of any of claims 1-4.
- 6. A computer readable storage medium for storing a computer program for implementing the method according to any one of claims 1-4 when executed by a processor.
Description
Network model training method, event extraction method, device and storage medium Technical Field The application relates to the technical field of event extraction, in particular to a network model training method, an event extraction method, equipment and a storage medium. Background At present, in the knowledge migration process, an auxiliary network model is usually trained by using an auxiliary task, and then knowledge migration is performed on a target network model. However, the auxiliary task is different from the target task to be performed by the target network model, so that the trained target network model still has the problem of lower accuracy. Disclosure of Invention The application provides a network model training method, an event extraction device and a storage medium, by letting a first network model learn knowledge of a second network model, the first network model after training is taken as a final model, and the final network model can be used for event extraction, and model performance is not affected when data resources are poor. In order to solve the technical problems, the application provides a network model training method based on gradient similarity, which comprises a first network model and a second network model, and concretely comprises the steps of acquiring first sample data from a training set corresponding to a target task and second sample data from a training sample corresponding to an auxiliary task, training the first network model by using the first sample data and training the second network model by using the second sample data, carrying out gradient update on a plurality of shared network layers between the first network model and the second network model, carrying out gradient update on an unshared network layer of the first network model according to the target task, carrying out gradient update on an unshared network layer of the second network model according to the auxiliary task, wherein the shared network layer and the unshared network layer are determined in advance by using the gradient similarity, and taking the first network model as a final network model. The method comprises the steps of training a first network model by using first sample data and training a second network model by using second sample data, wherein the method comprises the steps of obtaining a first loss value corresponding to the first network model, obtaining a second loss value corresponding to the second network model, determining a first gradient corresponding to each network layer in the first network model according to the first loss value, determining a second gradient corresponding to each network layer in the second network model according to the second loss value, determining gradient similarity between the first network model and the second network model according to the first gradient and the second gradient, and determining a shared network layer and an unshared network layer between the first network model and the second network model according to the gradient similarity. The method comprises the steps of determining a shared network layer and an unshared network layer according to gradient similarity, wherein the shared network layer and the unshared network layer comprise the steps of determining the network layer corresponding to the gradient similarity as the shared network layer if the gradient similarity is larger than or equal to a threshold value, and determining the network layer corresponding to the gradient similarity as the unshared network layer if the gradient similarity is smaller than the threshold value. Wherein the first sample data and the second sample data are plural. The method comprises the steps of determining a first gradient corresponding to each network layer in a first network model according to first loss values, and accumulating all the first gradients to obtain a first accumulated gradient. The method comprises the steps of determining a second gradient corresponding to each network layer in a second network model according to second loss values, and accumulating all the second gradients to obtain second accumulated gradients. The method comprises the steps of determining gradient similarity between a first network model and a second network model network layer according to a first gradient and a second gradient, and determining gradient similarity between the first network model and the second network model network layer according to a first accumulated gradient and a second accumulated gradient. Wherein the first network model and the second network model share an embedded layer. The target tasks comprise event extraction tasks, and the auxiliary tasks at least comprise relation extraction tasks and named entity recognition tasks. In order to solve the technical problem, the application provides another technical scheme for providing an event extraction method, which comprises the steps of obtaining a text corresponding to an event ext