CN-121997925-A - Automatic recognition method and system for accounting terms based on neural network
Abstract
The invention discloses an automatic recognition method and system for accounting terms based on a neural network. The method comprises the steps of constructing a scene accounting term corpus and introducing a scene perception mechanism, enabling a model to accurately identify term differences in different accounting scenes, solving the problem of low accuracy of cross-scene identification in a traditional method, effectively avoiding information dilution in long texts through a Encoder-Decoder model with multiple attentions, accurately capturing context association of accounting terms in the long sentences, and optimizing term category prediction accuracy and strengthening dependency relationship learning of term tag sequences through combination of a compound loss function and a CRF layer.
Inventors
- DING YI
- XUE LIXING
Assignees
- 南京财经大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251112
Claims (10)
- 1. An automatic recognition method of accounting terms based on a neural network is characterized by comprising the following steps: constructing a scenerized accounting term corpus and preprocessing to generate fusion vectors; establishing a two-way long-short-term memory network with scene perception based on the fusion vector; The two-way long-short-term memory network is used as an encoder, the fusion vector is converted into a hidden layer state matrix, and the association weight of each time step and the scene in the hidden layer state matrix and the probability weight of each time step belonging to the accounting term are calculated to finally generate the accounting term feature representation vector for a specific scene; constructing an accounting term identification model, inputting an accounting term feature representation vector to a conditional random field layer, and marking term boundaries and categories by learning the dependency relationship between accounting term labels; And inputting the accounting text to be identified into an accounting term identification model to obtain an identification result.
- 2. The automatic recognition method of accounting terms based on neural network according to claim 1, wherein the constructing a scenerised accounting term corpus comprises: collecting accounting text data in a financial statement, an audit report, a tax declaration file and an accounting practice guide scene, and labeling accounting terms in the text; and establishing a term-scene mapping table, and recording common expression variants of terms in different scenes.
- 3. The automatic recognition method of accounting terms based on neural network according to claim 1, the method is characterized in that the preprocessing to generate the fusion vector comprises the following steps: Word segmentation is carried out on the marked text, and supplementary recognition is carried out by counting the co-occurrence frequency and the context characteristics of words in the text aiming at the unknown words in the accounting field; Removing stop words and special symbols in the text, performing unified case conversion on English accounting terms, and performing standardized treatment on synonymous terms; And converting the preprocessed text data into Word vectors by using a Word2Vec model, setting the vector dimension as n multiplied by d, simultaneously encoding scene category information into scene vectors, fusing the Word vectors with corresponding scene vectors through vector splicing, and generating Word-scene fusion vectors.
- 4. The automatic recognition method of the accounting terms based on the neural network according to claim 3, wherein the adoption of the two-way long-short-term memory network as the encoder comprises the steps of taking word-scene fusion vectors as the serialization input of the two-way long-term memory network, converting the word-scene fusion vectors of the text into a hidden layer state matrix, outputting sentence integral expression vectors, and performing primary encoding of text semantic features and scene features.
- 5. The automatic recognition method of the accounting terms based on the neural network of claim 4, wherein the adoption of the two-way long-short-term memory network specifically comprises the steps of adding a scene input door, a scene forget door and a scene output door on the basis of an input door, a forget door and an output door of BiLSTM, and controlling the import quantity of a scene vector in each door, wherein the formula is as follows: scene input door: ; scene forget door: ; Scene output gate: ; ; ; ; ; Hidden layer state: ; Wherein, the For the word-scene fusion vector corresponding to time step t, S is the scene vector, In the event of a previous hidden state, The state is hidden for the current time step, As a function of the sigmoid, For the multiplication of elements, 、 In order to be a weighting matrix, 、 Is a bias term.
- 6. The automatic recognition method of an accounting term based on a neural network according to claim 1, wherein the generating the accounting term feature representation vector for the specific scene finally by calculating the association weights of each time step and the scene in the hidden layer state matrix and the probability weights of the accounting terms belonging to each time step comprises: setting a scene attention module and a term attention module, wherein the scene attention module calculates the association weight of each time step in the hidden layer state matrix and a scene based on sentence integral expression vectors and scene vectors; The term attention module calculates probability weights of all time steps belonging to the accounting terms in the hidden layer state matrix based on the dictionary features of the accounting terms; Determining scene attention weights and term attention weights according to the scene attention module and the term attention module, and carrying out weight fusion on the scene attention weights and the term attention weights; And carrying out weighted average on the hidden layer state matrix based on the fusion weight to generate an accounting term feature representation vector for a specific scene.
- 7. The automatic recognition method of accounting terms based on neural network according to claim 6, wherein the determining scene attention weight and term attention weight according to the scene attention module and the term attention module, and the weighting fusion of the scene attention weight and the term attention weight comprises: Scene attention weight: ; The term attention weight: ; Fusion weights: , wherein, Is a weight coefficient; Based on fusion weights Weighted averaging is carried out on the hidden layer state matrix, and an accounting term feature representation vector for a specific scene is generated: , wherein, 、 In order to pay attention to the weight matrix, 、 For the attention biasing term, T is the term feature vector.
- 8. The automatic recognition method of the accounting terms based on the neural network according to claim 1, wherein the inputting the accounting term feature representation vector to the conditional random field layer, and the labeling of the term boundaries and the categories by learning the dependency relationship between the accounting term labels comprises: The method comprises the steps of inputting a term feature representation vector to a conditional random field CRF layer, marking term boundaries and categories by the conditional random field CRF layer through learning the dependency relationship among accounting term labels, and meanwhile, adding a full connection layer in front of the conditional random field CRF layer, and carrying out dimension mapping and nonlinear transformation on the term feature representation vector to improve the recognition capability of a model on complex terms.
- 9. The automatic recognition method of accounting terms based on neural network of claim 8, wherein the conditional random field CRF layer loss function comprises: a composite loss function combining the cross entropy loss function and the log likelihood loss function of the CRF layer is adopted, and the formula is as follows: , wherein, In order to lose the weight coefficient(s), For optimizing the term class prediction error, For optimizing the term tag sequence dependency error.
- 10. An automatic recognition system of accounting terms based on neural network, characterized by comprising: The preprocessing module is used for constructing a scene accounting term corpus and preprocessing to generate fusion vectors; The model construction module is used for establishing a two-way long-short-term memory network with scene perception based on the fusion vector; The conversion module is used for converting the fusion vector into a hidden layer state matrix by adopting a two-way long-short-term memory network as an encoder; The generation module is used for finally generating an accounting term feature representation vector for a specific scene by calculating the association weight of each time step and the scene in the hidden layer state matrix and the probability weight of each time step belonging to the accounting term; The labeling module is used for constructing an accounting term identification model, inputting the accounting term feature representation vector to the conditional random field layer, and labeling the term boundaries and the categories by learning the dependency relationship among the accounting term labels; And the identification module is used for inputting the accounting text to be identified into the accounting term identification model to obtain an identification result.
Description
Automatic recognition method and system for accounting terms based on neural network Technical Field The invention relates to the technical field of natural language processing, in particular to an automatic recognition method and system for accounting terms based on a neural network. Background Accounting information plays a vital role in enterprise decision making, economic analysis and the like. The accounting terms are basic constituent units of the accounting information, and the accurate identification of the accounting terms in the text is the basis for performing accounting information processing, financial data analysis and other works. Currently, accounting term recognition relies mainly on manual recognition and rule-based recognition methods. The identification method based on the rules needs to manually formulate a large number of rules, has poor adaptability to complex and changeable accounting texts and has unsatisfactory identification effect. With the development of neural network technology, the neural network technology has achieved remarkable results in the field of natural language processing. The neural network technology is applied to the recognition of the accounting terms, and the accuracy and the efficiency of the recognition are expected to be improved. However, when the existing term identification method based on the neural network processes the technical terms in the accounting field, the identification effect still needs to be improved because the accounting terms have the characteristics of strong specialization, complex semantics and the like. In addition, the existing accounting term recognition method is mostly aimed at terms in a single type or a single context, lacks perception capability of term difference in different accounting scenes (such as financial statement, audit report, tax declaration and the like), and results in reduced accuracy in cross-scene recognition, meanwhile, information dilution is easy to occur when a model processes long text, and context related information of the accounting terms in the long sentence is difficult to accurately capture. Therefore, how to provide an automatic recognition method and system for accounting terms based on neural network, which overcomes the defects existing in the prior art is a problem that needs to be solved by those skilled in the art. Disclosure of Invention In view of the above, the invention provides an automatic recognition method and system for accounting terms based on a neural network, which overcomes the problems of low efficiency, poor accuracy, poor scene adaptability, insufficient long text processing capability and the like in the prior art of accounting term recognition, and improves the recognition accuracy and generalization capability of the accounting terms in different scenes and long texts by introducing a scene perception mechanism and a multiple attention model, and adopts the following technical scheme: An automatic recognition method of accounting terms based on a neural network comprises the following steps: constructing a scenerized accounting term corpus and preprocessing to generate fusion vectors; establishing a two-way long-short-term memory network with scene perception based on the fusion vector; The two-way long-short-term memory network is used as an encoder, the fusion vector is converted into a hidden layer state matrix, and the association weight of each time step and the scene in the hidden layer state matrix and the probability weight of each time step belonging to the accounting term are calculated to finally generate the accounting term feature representation vector for a specific scene; constructing an accounting term identification model, inputting an accounting term feature representation vector to a conditional random field layer, and marking term boundaries and categories by learning the dependency relationship between accounting term labels; And inputting the accounting text to be identified into an accounting term identification model to obtain an identification result. Optionally, the constructing the scenerified accounting term corpus includes: collecting accounting text data in a financial statement, an audit report, a tax declaration file and an accounting practice guide scene, and labeling accounting terms in the text; and establishing a term-scene mapping table, and recording common expression variants of terms in different scenes. Optionally, the preprocessing to generate the fusion vector includes: Word segmentation is carried out on the marked text, and supplementary recognition is carried out by counting the co-occurrence frequency and the context characteristics of words in the text aiming at the unknown words in the accounting field; Removing stop words and special symbols in the text, performing unified case conversion on English accounting terms, and performing standardized treatment on synonymous terms; And converting the preprocessed text data into Wor