CN-112507074-B - Numerical value reasoning method and device in machine reading understanding
Abstract
The embodiment of the specification provides a numerical reasoning method and device in machine reading understanding. The method comprises the steps of obtaining a current problem and a current text, determining each entity and each number contained in the current problem and the current text and the types corresponding to the numbers respectively, constructing a relation network diagram, forming neighbors between the digital nodes of the same type and between the entity nodes with preset relations and the digital nodes, determining a first problem representation vector corresponding to the current problem and an initial representation vector of each node in the relation network diagram, and carrying out iteration for preset times on each node in the relation network diagram based on the initial representation vector of each node to obtain an updated representation vector of each node. The ability of numerical reasoning in machine reading understanding to handle complex problems can be improved.
Inventors
- CHEN KUNLONG
- XU WEIDI
- CHENG XINGYI
- ZOU XIAOCHUAN
- WANG FENG
- WANG TAIFENG
- SONG LE
- CHU WAI
Assignees
- 支付宝(杭州)信息技术有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20200731
Claims (20)
- 1. A method of numerical reasoning in machine-readable understanding, the method comprising: Acquiring a current problem and a current text, wherein the current text is used for describing the condition of the current problem; Determining each entity and each number included in the current question and the current text, and the type corresponding to each number respectively; constructing a relation network diagram, wherein the relation network diagram comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed among the digital nodes of the same type and between the entity nodes and the digital nodes with preset relations through connecting edges; inputting the current question and the current text into a language model, and obtaining first semantic representation vectors corresponding to the positions of semantic elements in the current question and the current text through the language model; determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relation network diagram according to each first semantic characterization vector; Performing iteration on each node in the relation network graph for a predetermined number of times based on the initial characterization vector of each node, wherein each iteration comprises, for each node, performing neighbor node aggregation based on the first problem characterization vector and by using an attention mechanism to obtain an updated characterization vector of each node, so that the updated characterization vector shows the importance of different nodes, wherein the importance of the nodes related to the problem is higher than that of other nodes; And determining a numerical value reasoning answer according to the updated characterization vector of each node after the preset times of iteration.
- 2. The method of claim 1, wherein the types include at least one of: amount, time, percentage.
- 3. The method of claim 1, wherein the entity comprises at least one of: Name of person, place, name of article.
- 4. The method of claim 1, wherein the determining a first question-characterizing vector corresponding to the current question from each first semantic-characterizing vector comprises: And carrying out mean pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain a first problem representation vector corresponding to the current problem.
- 5. The method of claim 1, wherein the determining an initial token vector for each node in the relational network graph from each first semantic token vector comprises: And for any node in the relation network diagram, determining a plurality of semantic element positions matched with the content of any node in the current question and the current text, and carrying out mean pooling on a plurality of first semantic characterization vectors corresponding to the plurality of semantic element positions to determine an initial characterization vector of any node.
- 6. The method of claim 1, wherein each iteration comprises: determining a problem driving vector corresponding to the current iteration number by utilizing a neural network corresponding to the current iteration number based on the first problem representation vector; Determining an intermediate vector of each node based on the initial characterization vector, the current characterization vector and the problem driving vector of each node; transforming the intermediate vectors of the nodes by using the query matrix, the key matrix and the value matrix respectively to obtain query vectors, key vectors and value vectors corresponding to the nodes respectively; Performing similarity calculation on a query vector corresponding to a first node and a key vector corresponding to a second node to obtain the attention point from the second node to the first node, wherein the first node and the second node are any two nodes which are neighbors of each other in the relational network graph; And taking any node as a target node, carrying out weighted summation on the value vectors of all the neighbors according to all the attention points from all the neighbors of the target node to the target node, and determining the updated characterization vector of the target node based on the summation result.
- 7. The method of claim 6, wherein the determining a problem drive vector corresponding to a current number of iterations using a neural network corresponding to the current number of iterations based on the first problem characterization vector comprises: the first problem representation vector passes through a first full connection layer to obtain a first feature vector; the first feature vector is subjected to an activation function to obtain a second feature vector; and passing the second feature vector through a second full connection layer corresponding to the current iteration number to obtain a problem driving vector corresponding to the current iteration number.
- 8. The method of claim 6, wherein the determining an intermediate vector for each node based on the initial token vector, the current token vector, and the problem drive vector for each node comprises: splicing the initial characterization vector of each node with the current characterization vector of each node to obtain a first splicing vector corresponding to each node respectively; And converting the first spliced vector of each node into a preset dimension, and then carrying out bit-wise multiplication on the first spliced vector and the problem driving vector to obtain an intermediate vector of each node.
- 9. The method of claim 8, wherein the converting the first stitching vector into a preset dimension comprises: And the first spliced vector passes through a third full-connection layer to be converted into a preset dimension, wherein the preset dimension is the same as the dimension of the problem driving vector.
- 10. The method of claim 1, wherein said determining a numerical inference answer from updated token vectors for each node after said predetermined number of iterations comprises: for any node in each node in the relation network diagram, determining a plurality of semantic element positions matched with the content of any node in the current question and the current text, and acquiring a plurality of first semantic characterization vectors corresponding to the plurality of semantic element positions; Updating the acquired first semantic representation vectors corresponding to the semantic element positions according to the updated representation vector of any node after the iteration for the preset times to determine second semantic representation vectors corresponding to the semantic element positions respectively; determining a first comprehensive characterization vector corresponding to the current question and the current text according to each second semantic characterization vector corresponding to a plurality of semantic element positions in the current question and the current text respectively and each first semantic characterization vector corresponding to other semantic element positions respectively; determining an answer type corresponding to the numerical reasoning answer by using a first classification model according to the first comprehensive characterization vector; and determining the numerical reasoning answer by using a second classification model at least according to the answer type and the first comprehensive characterization vector.
- 11. The method of claim 10, wherein the answer types include at least one of: answer extraction, counting questions, and arithmetic expression class questions.
- 12. The method of claim 10, wherein the answer type is answer extraction; the determining the numerical inference answer by using a second classification model at least according to the answer type and the first comprehensive characterization vector comprises the following steps: Determining a second problem representation vector corresponding to the current problem according to each second semantic representation vector corresponding to a plurality of semantic element positions in the current problem and each first semantic representation vector corresponding to other semantic element positions; Multiplying the first comprehensive characterization vector by the second problem characterization vector by bits to obtain a first cross characterization vector; and inputting the first comprehensive characterization vector and the first cross characterization vector into the second classification model after splicing to obtain the numerical value reasoning answer.
- 13. The method of claim 12, wherein the second classification model is configured to predict answer starting locations and answer ending locations in each semantic element location to obtain the numerically inferred answer from the answer starting locations and answer ending locations.
- 14. The method of claim 10, wherein the answer type is a count question; the second classification model is used for predicting numbers from 0 to 9 to obtain the numerical reasoning answer.
- 15. The method of claim 10, wherein the answer type is an arithmetic expression class question; The second classification model is used for predicting the current question and the sign of each number in the current text, the sign comprises an plus sign, a minus sign and 0, and a numerical value reasoning answer is obtained through each number and sign operation.
- 16. A numerical reasoning apparatus in machine-readable understanding, the apparatus comprising: an acquisition unit for acquiring the current question and the current text, the current text is used for describing the condition of the current problem; a first determining unit, configured to determine each entity and each number included in the current question and the current text acquired by the acquiring unit, and a type corresponding to each number respectively; the construction unit is used for constructing a relation network diagram, wherein the relation network diagram comprises entity nodes corresponding to the entities determined by the first determination unit and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes with preset relation and the digital nodes through connecting edges; The first characterization unit is used for inputting the current problem and the current text acquired by the acquisition unit into a language model, and acquiring a first semantic characterization vector corresponding to each semantic element position in the current problem and the current text through the language model; the second characterization unit is used for determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relational network graph according to each first semantic characterization vector obtained by the first characterization unit; The iteration unit is used for carrying out iteration on each node in the relation network graph for a preset number of times based on the initial characterization vector of each node obtained by the second characterization unit, wherein each iteration comprises, for each node, carrying out neighbor node aggregation based on the first problem characterization vector obtained by the second characterization unit and by using an attention mechanism so as to obtain an updated characterization vector of each node, so that the updated characterization vector shows the importance of different nodes, and the importance of the nodes related to the problem is higher than that of other nodes; and the second determining unit is used for determining a numerical value reasoning answer according to the updated characterization vectors of the nodes after the iteration of the preset times, which are obtained by the iteration unit.
- 17. The apparatus of claim 16, wherein the type comprises at least one of: amount, time, percentage.
- 18. The apparatus of claim 16, wherein the entity comprises at least one of: Name of person, place, name of article.
- 19. The apparatus of claim 16, wherein the second characterization unit is specifically configured to average and pool each first semantic characterization vector corresponding to each semantic element position in the current problem to obtain a first problem characterization vector corresponding to the current problem.
- 20. The apparatus of claim 16, wherein the second characterization unit is specifically configured to determine, for any node in the nodes in the relational network graph, a plurality of semantic element positions in the current question and the current text that match the content of the any node, and average a plurality of first semantic characterization vectors corresponding to the plurality of semantic element positions to determine an initial characterization vector of the any node.
Description
Numerical value reasoning method and device in machine reading understanding The invention is a divisional application of the invention application of which the application date is 31 of 07 th year 2020, the application number is 202010759810.0, and the invention name is a numerical value reasoning method and device in machine reading and understanding. Technical Field One or more embodiments of the present description relate to the field of computers, and more particularly, to a method and apparatus for numerical reasoning in machine-readable understanding. Background Machine-readable understanding is a task in natural language processing that typically presents questions and text that describes the condition of the question, through which answers to the questions can be obtained. In machine-readable understanding, numerical reasoning is an important capability, and generally includes numerical reasoning modes such as addition, subtraction, ranking, statistics, and the like. For a question related to numerical reasoning, how to infer the correct answer based on text is the question of current interest. In the prior art, when numerical reasoning is performed in machine reading understanding, a correct answer is often not obtained in the face of a complex problem. Thus, improved schemes are desired that increase the ability of numerical reasoning in machine reading understanding to handle complex problems. Disclosure of Invention One or more embodiments of the present specification describe a method and apparatus for numerical reasoning in machine-readable understanding that can improve the ability of numerical reasoning in machine-readable understanding to handle complex problems. In a first aspect, a method for numerical reasoning in machine-readable understanding is provided, the method comprising: Acquiring a current problem and a current text, wherein the current text is used for describing the condition of the current problem; Determining each entity and each number included in the current question and the current text, and the type corresponding to each number respectively; Constructing a relation network diagram, wherein the relation network diagram comprises entity nodes corresponding to the entities and digital nodes corresponding to the numbers, and neighbors are formed between the digital nodes of the same type and between the entity nodes with preset relation and the digital nodes through connecting edges; inputting the current question and the current text into a language model, and obtaining first semantic representation vectors corresponding to the positions of semantic elements in the current question and the current text through the language model; determining a first problem characterization vector corresponding to the current problem and an initial characterization vector of each node in the relation network diagram according to each first semantic characterization vector; Performing iteration on each node in the relation network graph for a predetermined number of times based on the initial characterization vector of each node, wherein each iteration comprises, for each node, performing neighbor node aggregation based on the first problem characterization vector and by using an attention mechanism to obtain an updated characterization vector of each node; And determining a numerical value reasoning answer according to the updated characterization vector of each node after the preset times of iteration. In one possible embodiment, the types include at least one of: amount, time, percentage. In one possible embodiment, the entity comprises at least one of: Name of person, place, name of article. In a possible implementation manner, the determining, according to each first semantic representation vector, a first problem representation vector corresponding to the current problem includes: And carrying out mean pooling on each first semantic representation vector corresponding to each semantic element position in the current problem to obtain a first problem representation vector corresponding to the current problem. In one possible implementation manner, the determining an initial token vector of each node in the relational network graph according to each first semantic token vector includes: And for any node in the relation network diagram, determining a plurality of semantic element positions matched with the content of any node in the current question and the current text, and carrying out mean pooling on a plurality of first semantic characterization vectors corresponding to the plurality of semantic element positions to determine an initial characterization vector of any node. In one possible embodiment, each iteration comprises: determining a problem driving vector corresponding to the current iteration number by utilizing a neural network corresponding to the current iteration number based on the first problem representation vector; Determining an intermediate vector of each node base