CN-122019697-A - Context reference digestion method and system based on dialogue structure perception
Abstract
The application discloses a context reference resolution method and a system based on dialogue structure perception, and relates to the field of natural language processing, wherein the method comprises the steps of acquiring multi-round dialogue corpus, respectively storing dialogue entities in a shared memory pool and a private memory pool, and clearly distinguishing only entities with effective individual cognition from global common entities, so as to eliminate reference ambiguity caused by confusion of knowledge ranges of different speakers; when reference digestion is carried out, an interpretable cognitive compatibility scoring system is established through a dialog structure-driven significance recursion updating mechanism, semantic role consistency judgment, round association attenuation modeling and other multi-layer structure signal fusion, and an entity selection mechanism with multi-dimensional information co-constraint is formed. Therefore, through introducing a dialogue structure perceived double-domain entity memory modeling mechanism, the pronoun front finger accuracy and robustness under complex scenes such as long dialogue, cross-speaker, multi-round interaction and the like are effectively improved.
Inventors
- LUO YUHUI
- MEI YANG
- ZHAO YUXIN
- YU YINGXIA
- FENG FAN
- Li Zelian
- SHEN BAOZHU
Assignees
- 北京工成商通科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251211
Claims (9)
- 1. A method of context-referenced resolution based on conversational structure awareness, the method comprising: Acquiring a plurality of rounds of dialogue corpus, wherein the plurality of rounds of dialogue corpus comprises pronouns, speaker identities and dialogue turn sequences; Constructing a shared memory pool and a plurality of private memory pools based on the multi-round dialog corpus, wherein the private memory pools store private entities known only by corresponding speakers, and the shared memory pools store shared entities known to all speakers; when detecting that a to-be-resolved pronoun exists in the current round of speaking, extracting a candidate entity set from a private memory pool and the shared memory pool of a current speaking speaker; Based on the dialogue structure relation, extracting entity significance, semantic role consistency and round relevance, and calculating cognitive compatibility scores between each candidate entity and the pronouns to be resolved; And selecting the candidate entity with the highest score as the reference object of the pronoun to be resolved according to the cognitive compatibility score.
- 2. The method of claim 1, wherein the multi-round dialog corpus has multi-dimensional semantic tags including entity attribute tags and dialog behavior tags; the entity attribute tag is used for defining the sharing attribute or the private attribute of each entity in the dialogue so as to indicate whether the entity is information known by all speakers; the dialogue action tag is used for describing semantic structure relations of each speaking turn, and comprises any action type of questioning, answering, confirming or setting forth.
- 3. The method of claim 2, wherein the constructing a shared memory pool and a plurality of private memory pools based on the multi-round dialog corpus comprises: Extracting entities in each round of dialogue corpus, and generating corresponding entity semantic vectors based on semantic features of the entity in the context; identifying an entity type of the respective entity based on the entity attribute tags, the entity type being a shared entity known by all speakers or a private entity known only by the corresponding speaker; the private memory pool corresponding to the speaker is built based on the entity semantic vectors and the corresponding dialogue action labels of the private entities, and the shared memory pool is built based on the entity semantic vectors and the corresponding dialogue action labels of the shared entities.
- 4. A method according to claim 3, wherein said constructing a private memory pool for a speaker based on the entity semantic vector and the corresponding conversation activity label for each private entity, constructing a shared memory pool based on the entity semantic vector and the corresponding conversation activity label for each shared entity, comprises: for each private entity, updating the entity memory vector and the significance score of the speaker to which the private entity belongs based on the entity semantic vector and the corresponding dialogue behavior label of the private entity; the entity memory vector of the private entity is recursively updated by the following formula: , In the formula, The dialog corpus turn number is represented, The index of the speaker is represented and, Representing private entities identified in corpus Is a physical index of (a); And Respectively represent the first Wheel and the first Speaker in wheel dialogue corpus Private entity of (a) Is a memory vector of (a); Is a private entity In the first place The gating coefficient corresponding to the wheel, Is a private entity In the first place Entity semantic vectors corresponding to the corpus of rounds; the saliency score of a private entity is recursively updated by: , In the formula, For the preset attenuation coefficient, the damping coefficient is set, For the behavior-dependent activation coefficient, its value is determined by the corresponding first Dialogue behavior tags for a round dialogue corpus Determining; For the vector transpose operation, The function is activated for Sigmoid, And Respectively represent the first Wheel and the first Speaker in wheel dialogue corpus Private entity of (a) Is a significance score of (2); Represent the first Coding vectors of the wheel dialog corpus; Indicating quantity for private entity, when private entity In the first place 1 Is extracted from the round dialog corpus, otherwise 0 is extracted; Updating entity memory vectors and significance scores for each shared entity based on entity semantic vectors and corresponding dialogue behavior tags of the shared entity; wherein, the entity memory vector of the shared entity is recursively updated by: , In the formula, Representing shared entities identified in a corpus Is used for the physical index of (a), And Respectively represent the first Wheel and the first Shared entities in a round-robin corpus Is used to determine the memory vector of the (c), For sharing entities In the first place The gating coefficient corresponding to the wheel, Representing shared entities In the first place Entity semantic vectors corresponding to the corpus of rounds; The significance score of the shared entity is recursively updated by: , In the formula, And Respectively represent the first Wheel and the first Shared entities in a round-robin corpus Is a significance score of (2); Indicating quantity for sharing entity, when sharing entity In the first place The round dialog corpus is extracted by 1, otherwise, 0.
- 5. The method of claim 4, wherein the gating coefficients are dynamically generated by a learning gating network: , In the formula, The vector concatenation operation is represented by a vector, And A learnable parameter matrix and bias terms of the gating network respectively.
- 6. The method of claim 4, wherein different types of dialog behavior tags are each preconfigured with a corresponding behavior-related activation coefficient for giving different weights to entities involved in different dialog behavior types when calculating the saliency score.
- 7. The method of claim 4, wherein the extracting the candidate entity set from the private memory pool of the current speaker and the shared memory pool comprises: Performing time window filtering, saliency filtering, semantic type matching and dialogue structure filtering on the entities in the private memory pool and the shared memory pool respectively, wherein the time window filtering is used for screening the entities close to the current round of context time, the saliency filtering is used for reserving the entities with the saliency score higher than a preset threshold value, the semantic type matching is used for reserving the entities with the same type according to the semantic type of the pronouns to be resolved, and the dialogue structure filtering is used for limiting the entity association range according to the dialogue behavior labels of the current round; and merging and de-duplicating the private entity and the shared entity screened by each filtering condition to obtain a candidate entity set.
- 8. The method of claim 4, wherein extracting entity significance, semantic role consistency, and round association based on dialogue structure relationships, calculating a cognitive compatibility score between each candidate entity and a pronoun to be resolved, comprises: determining the speech semantic matching degree of the candidate entity: , In the formula, An index representing the candidate entity is presented, A turn number representing the current dialog turn, Representing candidate entities With the current dialog turn Is used for determining the speech semantic matching degree of the speech, A coding vector representing the dialog sentence of the current round, Representing candidate entities Semantic vector representations in the current round context; determining grammar role matching degree of candidate entities: , In the formula, Representing candidate entities With the current turn Reflecting the consistency of the grammar role matching degree and the grammar role matching degree on the syntactic structure; the grammar role vector of the current round sentence represents the grammar function of the pronoun in the current sentence; A syntax role vector corresponding to the candidate entity represents the syntax role characteristics of the entity when the entity is mentioned; Determining round distance weights of candidate entities: , In the formula, Representing candidate entities With the current turn For representing the influence of the time distance to which the candidate entity is referred on the reference probability; The round attenuation coefficient is used for controlling the attenuation speed of the round interval pair matching score; As candidate entity The last mentioned round number; is an exponential function, used for converting the distance difference into attenuation weights in the interval [0,1 ]; Obtaining the significance score of the candidate entity of the current turn, fusing the speaking semantic matching degree, the grammar role matching degree and the candidate entity turn distance, and calculating the cognitive compatibility score between the candidate entity and the pronoun to be resolved: , In the formula, Representing candidate entities Cognitive compatibility scores with the current turn, Representing candidate entities At the significance score of the current round, For the scoring of the weighting coefficients, Is a bias term.
- 9. A dialog structure awareness based context reference resolution system, the system comprising: the corpus acquisition unit is used for acquiring multiple rounds of dialogue corpus, wherein the multiple rounds of dialogue corpus comprises pronouns, speaker identities and dialogue round sequences; A memory pool construction unit, configured to construct a shared memory pool and a plurality of private memory pools based on the multi-round dialogue corpus, where the private memory pool stores private entities known only by corresponding speakers, and the shared memory pool stores shared entities known to all speakers; the candidate entity extraction unit is used for extracting a candidate entity set from a private memory pool and the shared memory pool of a current speaking speaker when detecting that a to-be-resolved pronoun exists in the current round of speaking; the cognitive compatibility scoring unit is used for extracting entity significance, semantic role consistency and round relevance based on dialogue structure relations and calculating cognitive compatibility scores between candidate entities and the pronouns to be resolved; And the reference object selection unit is used for selecting the candidate entity with the highest score as the reference object of the pronoun to be resolved according to the cognitive compatibility score.
Description
Context reference digestion method and system based on dialogue structure perception Technical Field The application relates to the technical field of natural language processing, in particular to a context reference digestion method and system based on dialogue structure perception. Background With the widespread use of conversational artificial intelligence in customer services, virtual assistants, and multimodal interactive systems, one of the core challenges faced by multi-round conversational semantic understanding is long-range reference resolution. In interactions across multiple rounds, a participant will reuse the pronouns "it," "he," and "there," etc., to refer to the previously mentioned entities or events. If the dialog system fails to correctly recognize the foreigner objects of these pronouns, a misunderstanding may result. For example, "A, I and Zhang San, li Si go to the mall yesterday. B: what is he buying "what is" he "in" what is meant by what, which needs to be determined from the context. The current dialogue system converts multiple rounds of dialogue into single round of task by simplifying task, and compiles forefront content of pronouns by expanding text for each sentence, so that context information is flattened, partial pronoun ambiguity can be relieved, but the context information is focused on an explicit text structure, information difference among dialogue participants is ignored, hidden semantics or shared background information cannot be captured when the system faces to situations such as shared memory or private knowledge in dialogue, semantic ambiguity or misjudgment is easily generated, and accuracy and consistency of overall dialogue understanding are affected. Disclosure of Invention The application provides a context reference resolution method, a system, a storage medium, a computer program product and electronic equipment based on dialogue structure perception, which are used for at least solving the problem that pronoun pointing confusion is caused by the fact that different contexts and knowledge sources cannot be effectively distinguished in a plurality of rounds of dialogue in the prior art. In a first aspect, an embodiment of the application provides a context reference resolution method based on dialogue structure perception, which comprises the steps of obtaining a multi-round dialogue corpus, wherein the multi-round dialogue corpus comprises pronouns, speaker identities and dialogue round sequences, constructing a shared memory pool and a plurality of private memory pools based on the multi-round dialogue corpus, wherein the private memory pool stores private entities known only by corresponding speakers, the shared memory pool stores shared entities known by all speakers, when a to-be-resolved pronoun is detected in a current round of speech, extracting a candidate entity set from the private memory pool of the current speaker and the shared memory pool, extracting entity significance, semantic role consistency and round relevance based on dialogue structure, calculating a cognitive compatibility score between each candidate entity and the to-be-resolved pronoun, wherein the cognitive compatibility score is used for representing the probability that the candidate entity becomes a reference target of the to-be-resolved pronoun, and selecting the candidate entity with the highest score as the reference object to be-resolved pronoun according to the cognitive compatibility score. In a second aspect, the embodiment of the application provides a context reference digestion system based on dialogue structure perception, which comprises a corpus acquisition unit, a memory pool construction unit, a candidate entity extraction unit and a reference selection unit, wherein the corpus acquisition unit is used for acquiring multiple rounds of dialogue corpus, the multiple rounds of dialogue corpus comprises pronouns, speaker identities and dialogue round sequences, the memory pool construction unit is used for constructing a shared memory pool and a plurality of private memory pools based on the multiple rounds of dialogue corpus, the private memory pool is used for storing private entities known only by corresponding speakers, the shared memory pool is used for storing shared entities known by all speakers, the candidate entity extraction unit is used for extracting a candidate entity set from the private memory pool of the current speaker and the shared memory pool when a to-be-digested pronoun exists in the current round of speech, the cognition scoring unit is used for extracting entity salience, semantic character consistency and round relevance based on dialogue structure relations, the cognition compatibility score between each candidate entity and to-be-pronoun, the cognition compatibility score is used for representing probability that the candidate entity becomes a reference target of to be-pronouned, and the reference score of the candidate entity is used for