CN-121980037-A - Time constraint and multi-task learning-based time sequence knowledge spectrum conflict detection

CN121980037ACN 121980037 ACN121980037 ACN 121980037ACN-121980037-A

Abstract

The invention discloses a time sequence knowledge graph conflict detection method based on time constraint and multi-task learning, and belongs to the field of knowledge graph construction. The method comprises the steps of preprocessing time sequence knowledge graph data, constructing predicate constraint rules, learning entity embedding with time perception, wherein the entity embedding comprises entity static embedding and time dynamic embedding, constructing relation embedding with time perception, and finally identifying whether quadruples in the time sequence knowledge graph conflict. According to the time sequence knowledge map conflict detection method, a time constraint and multi-task learning framework is introduced, entity embedding and relation embedding with more time perception capability and four-element embedding with more accuracy and strong distinguishing property can be learned, and the four-element knowledge representation model learning task with time perception and the four-element conflict classification task are jointly trained, so that the performance of the conflict detection method is improved, and the robustness to noise data is enhanced.

Inventors

ZHANG CHUNXIA
Xue xinyue
WANG YIZHOU
Song Haipei
GU ANCHENG

Assignees

北京理工大学

Dates

Publication Date: 20260505
Application Date: 20251031
Priority Date: 20250506

Claims (4)

1. A time sequence knowledge spectrum conflict detection method based on time constraint and multi-task learning is characterized by comprising three modules: (1) The first module is a predicate constraint filtering module, and a predicate constraint rule is constructed according to domain knowledge and time constraint conditions, wherein the predicate constraint rule comprises a predicate relation mutual exclusion rule, a time non-overlapping rule and a predicate time sequence priority rule; (2) The second module is a four-element knowledge representation module with time perception, and performs feature learning, including an entity embedded learning layer with time perception and a relation embedded learning layer; (3) The third module is a four-tuple conflict classification module, which is a four-tuple conflict classification and four-tuple knowledge representation learning task joint optimization layer, and the detection model is enabled to accurately learn the four-tuple representation by utilizing the weighted loss function optimization model, so that the recognition capability of noise data is improved; the time constraint and multi-task learning based time sequence knowledge spectrum conflict detection method comprises the following steps: step 1, preprocessing time sequence knowledge graph data, and constructing predicate constraint rules; step 1.1, preprocessing a time sequence knowledge graph data set; The time sequence knowledge graph is defined as a four-element set, as shown in a formula (1): Wherein, the A set of entities is represented and, A set of relationships is represented and, Represents a set of timestamps, t b ,t e represents a timestamp, and Step 1.2, constructing predicate constraint rules according to constraint relations between domain knowledge and predicate relations; step 1.2A, constructing predicate relation mutual exclusion rules; The predicate-relationship mutual exclusion rule R1 is defined as if two tetrads have the same head entity, predicate relationship, and timestamp information, with two different tail entities, then the two tetrads violate the predicate-relationship mutual exclusion rule, as shown in equation (2): Wherein, Λ is a conjunctive word of the conjunctive proposition; The predicate-relationship mutual exclusion rule R2 is defined as if two tetrads have the same tail entity, predicate relationship, and timestamp information, with two different head entities, then the two tetrads violate the predicate-relationship mutual exclusion rule, as shown in equation (3): Wherein, Λ is a conjunctive word of the conjunctive proposition; The predicate-relationship mutual exclusion rule R3 is defined as if two tetrads have the same head entity, tail entity, and timestamp information, with two different predicate relationships, then the two tetrads violate the predicate-relationship mutual exclusion rule, as shown in formula (4): Wherein, Λ is a conjunctive word of the conjunctive proposition; The predicate-relationship mutual exclusion rule R4 is defined as that if two quaternions have the same head entity, predicate relationship, tail entity, and different timestamps, then the two quaternions violate the predicate-relationship mutual exclusion rule, as shown in formula (5): Wherein, Λ is a conjunctive word of proposition, which is a disjunctive word of proposition; step 1.2B, constructing a time non-overlapping constraint rule; the time non-overlapping constraint rule R5 is defined as that the time interval of any two fact quaternions with the same head entity and predicate relation is non-overlapping, and one of the following six situations is satisfied; Case 1: the time interval of any two fact quaternions with the same head entity and predicate relationship has no intersection, as shown in equation (6); Wherein when p i and p z are the same predicate relationship, o i and o z are different tail entities, the timestamps [ τ ib ,τ ie ] and [ τ zb ,τ ze ] of the two fact quaternions are non-overlapping; Case 2 for two fact quaternions of the same head entity and predicate relationship, the end timestamp of one fact quaternion is equal to the end timestamp of the other fact quaternion, as shown in equation (7); where p i and p z are the same predicate relationships, and o i and o z are different tail entities, the ending timestamp τ ie of one fact quadruple is equal to the starting timestamp τ zb of another fact quadruple; Case 3. For two fact quaternions of the same head entity and predicate relationship, the start timestamp of one fact quaternion is equal to the start timestamp of the other fact quaternion, the end timestamp of the former is earlier than the end timestamp of the latter, as shown in equation (8); Where p i and p z are the same predicate relationship, and o i and o z are different tail entities, the start timestamp τ ib of one fact quadruple is equal to the start timestamp τ zb of another fact quadruple, the end timestamp τ ie of the former is earlier than the end timestamp τ ze of the latter; Case 4. For two fact quaternions of the same head entity and predicate relationship, the start timestamp of one fact quaternion is later than the start timestamp of the other fact quaternion, the end timestamp of the former is equal to the end timestamp of the latter, as shown in equation (9); Where p i and p z are the same predicate relationship, and o i and o z are different tail entities, the start timestamp τ ib of one fact quadruple is later than the start timestamp τ zb of another fact quadruple, the end timestamp τ ie of the former is equal to the end timestamp τ ze of the latter; Case 5. For two fact quadruples of the same head entity and predicate relationship, the start timestamp of one fact quadruple is earlier than the start timestamp of the other fact quadruple, the end timestamp of the former is earlier than the end timestamp of the latter, as shown in equation (10); Where p i and p z are the same predicate relationship, and o i and o z are different tail entities, the start timestamp τ ib of one fact quadruple is earlier than the start timestamp τ zb of another fact quadruple, the end timestamp τ ie of the former is earlier than the end timestamp τ ze of the latter; case 6. For two fact quaternions of the same head entity and predicate relationship, the start timestamp of one fact quaternion is later than the start timestamp of the other fact quaternion, the end timestamp of the former is later than the end timestamp of the latter, as shown in equation (11); Where p i and p z are the same predicate relationship, and o i and o z are different tail entities, the start timestamp τ ib of one fact quadruple is later than the start timestamp τ zb of another fact quadruple, the end timestamp τ ie of the former is later than the end timestamp τ ze of the latter; step 1.2C, constructing predicate time sequence priority rules; The predicate timing priority rule is defined as that, for a predicate relationship having a temporal order, one fact four-tuple occurs before another fact four-tuple, i.e., the end time of the predicate relationship of high priority must be earlier than or equal to the start time of another predicate relationship, and the predicate timing priority rule is defined as shown in equation (12) assuming that the end time of predicate relationship p i is earlier than the start time of predicate relationship p z : step 2, learning entity embedding with time perception; step 3, constructing a relation embedding with time perception; And 4, identifying whether the quadruples in the time sequence knowledge graph conflict.
2. The method for detecting time sequence knowledge graph conflict based on time constraint and multi-task learning according to claim 1, wherein the step 2 specifically comprises the following steps: the entity embedding with time perception comprises entity static embedding and time dynamic embedding, and the entity static embedding captures entity characteristics which do not evolve with time; step 2.1, learning initial entity static embedding based on the four-tuple data set output in the step 1; when the entity is built into the static embedding, the semantic features of the entity may change when the entity is considered to be at different positions in the quadruple, so that two embeddings need to be learned for each entity v, respectively representing the initial static embedding when the entity v is used as the head entity And initial static embedding when used as a tail entity The calculation method is shown in formula (13); Where N is the number of entities, Is a statically embedded dimension, nn.emmbedding (·) can map inputs to a continuous vector space; Step 2.2, dynamically embedding learning time based on the four-tuple data set filtered in the step 1; The time dynamic embedding consists of time embedding taking 'year' as a coding dimension, time embedding taking 'month' as a coding dimension and time embedding taking 'day' as a coding dimension, wherein three time embedding of 'year', 'month' and 'day' are obtained by phase embedding, period embedding and amplitude embedding calculation; Initializing phase embedding, period embedding and amplitude embedding; dividing time information into three dimensions of year, month and day, using phase embedding Periodic embedding And amplitude embedding The method comprises the steps of (1) calculating to obtain 'year' embedding, 'month' embedding and 'day' embedding, and randomly initializing phase embedding, period embedding and amplitude embedding, wherein the calculating method is shown as a formula (14): Wherein, the Is time dynamic embedding; Step 2.2.2, obtaining annual embedding, monthly embedding and daily embedding by using phase embedding, period embedding and amplitude embedding calculation; annual embedding using phase embedding, period embedding and amplitude embedding calculations Month insert Sunday embedding As shown in formula (15), formula (16) and formula (17); Wherein, the The amplitude embedding representing "years" is shown, The amplitude embedding representing "month" is performed, Amplitude embedding representing "day"; the period of "year" is indicated as embedded, The period of the "month" is indicated as embedded, Periodic embedding representing "day"; the phase embedding representing "year" is shown, The phase embedding representing the "month" is, Τ year ,τ month ,τ day represents the values of "year", "month", "day" in the timestamp information, sin (·) is a sine function for simulating periodic features of time, such as seasonal events, events at regular time intervals, etc., which is smooth and continuous, ensuring that the updating of parameters is stable and consistent during the back propagation process; Step 2.2.3, adding the year embedding, the month embedding and the day embedding to obtain time dynamic embedding; Embedding years into Month insert Sunday embedding Adding to obtain time dynamic embedding As shown in equation (18): Step 2.3, constructing entity embedding with time perception; dynamic fusion of entity static embedding and time dynamic embedding by using gating mechanism, and construction of entity vector with time perception Step 2.3.1, calculating the attention score of the joint feature vector; respectively statically embedding the entities constructed in the step 2.1 And Time dynamic embedding Splicing and constructing joint feature vectors And Inputting the full-connection attention network to obtain the attention score of the joint feature vector And (3) with As shown in formula (19): Wherein, the Is a trainable weight matrix of the attention network, Is the bias vector, tanh (·) is the hyperbolic tangent activation function, d att is the attention dimension, and Through the operation of the formula (19), nonlinear characteristics are added for the entity joint characteristics, so that the complex dependency relationship in input data can be captured when the conflict detection method is used for learning entity embedding; step 2.3.2, calculating a gating signal; the attention score output in step 2.3.1 And Inputting a fully connected network to obtain gate control signals And The calculation method is shown in the formula (20): wherein sigma (·) is a sigmoid activation function, Is a matrix of weights that can be trained, Is a trainable bias term, gating vector And Wherein each value of the set of values controls the characteristic inflow quantity of the corresponding position, and a value close to 1 indicates that more information is allowed to pass through, and a value close to 0 indicates that characteristic information of the position is to be filtered; step 2.3.3, generating entity embedding with time perception; in step 2.3.2, the gate control signal is obtained And Then, the element multiplication is carried out on the feature vector e v of the entity and the feature vector e v of the entity, so as to generate the entity embedding with time perception And As shown in formula (21):
3. The method for detecting time sequence knowledge graph conflict based on time constraint and multi-task learning according to claim 1, wherein the step 3 specifically comprises the following steps: the relation embedding is obtained by randomly initialized predicate relation embedding and time stamp embedding through long-term memory network processing, and the output hidden layer state is the relation embedding with time perception; the method specifically comprises the following substeps: Step 3.1, learning initial relation embedding based on the four-element data set filtered in the step 1; Based on the quaternion (s i ,p i ,o i ,[τ ib ,τ ie ) in the quaternion dataset output in the step 1, for the forward relation p i , namely, the forward relation p is pointed to the tail entity o i by the head entity s i , the initial relation embedding is learned The calculation method is shown in formula (22): For inverse relation Namely, the tail entity o i points to the head entity s i , the same learning method is generated by adopting forward relation embedding phase, and initial relation embedding is learned Step 3.2, generating time embedding of time dimension years, months and days; Based on the four-tuple data set output in the step 1, dividing the time stamp information into three numerical values of 'year', 'month', 'day', converting the time information in the numerical value form into a vector form through a linear layer, and obtaining three time embeddings of year, month and day, as shown in a formula (23), a formula (24) and a formula (25); Where W l is the weight of the linear layer and b l is the bias vector; Step 3.3, constructing a relation embedding with time perception; Embedding static relationships And Respectively embedding and splicing the hidden layer output with the time stamp to obtain a relation mark sequence, inputting the relation mark sequence into a long-short-period memory network, obtaining hidden layer output and embedding the hidden layer output as a relation with time perception; step 3.3.1, constructing a relationship marker sequence; Initializing embedding of a splicing relation and embedding of three time dimensions of years, months and days to obtain a relation marking sequence, wherein the relation marking sequence is shown in a formula (26): step 3.3.2, embedding a learning time relation; Embedding the relationship tag sequence into the long-short term memory network to obtain a time relationship embedding with time perception capability, as shown in formula (27): Where h t and c t are the hidden state and the cell state of the time step respectively, t=1, 2,..i. seq is the number of time steps, i.e., the sequence length; when t=0, h t and c t are initialized to all zero vectors, and the state of the last output hidden layer is time relation embedding, namely
4. The method for detecting time sequence knowledge graph conflict based on time constraint and multi-task learning according to claim 1, wherein the step 4 specifically comprises the following steps: Based on the four-tuple knowledge representation task with time perception and the four-tuple conflict classification task, joint learning is performed; The method specifically comprises the following steps: Step 4.1, obtaining entity embedding and relation embedding with time perception in the quadruple (s i ,p i ,o i ,[τ ib ,τ ie ) through the step 2 and the step 3: for entities s i and o i , two embedded representations with temporal awareness are learned, For embedding when the head entity s i is in the head entity position, For embedding when the head entity s i is in the tail entity position, For embedding when the header entity o i is in the header entity position, Used as an embedment when the head entity o i is in the tail entity position; Calculating the scores of the four-element groups and updating entity embedding and relation embedding; Updating the four-tuple embedding learned by the knowledge representation model based on minimizing an error between the predicted score and the true four-tuple score of the four-tuple knowledge representation model with time perception, the four-tuple scoring function being as shown in formula (28): Wherein </cndot > represents the product of the entity embedded vector and the relation embedded vector in the quadruple, Step 4.3, obtaining two kinds of classified labels of whether the four-element group conflicts; The four-element conflict classification module receives head entity embedding, relation embedding and tail entity embedding which are output by the four-element knowledge representation learning module and have time perception capability, and obtains a four-element label predicted by a detection model through binary classifier processing; 4.3.1, constructing a quadruple embedding with time coding information; Splice physical embedding Relational embedding And tail entity embedding Resulting in a four-tuple embedded representation with time-coded information, as shown in equation (29): Step 4.3.2, obtaining class probability of the quadruple, and outputting whether conflicting class labels of the quadruple: Embedding the quadruple with time-coded information into an input fully-connected neural network, generating a quadruple class probability, as shown in formula (30): Wherein σ (·) is a Sigmoid activation function, W 2 and b 2 are respectively the weight and the threshold of the fully connected neural network 2, and the four-tuple conflict classification probability p cls e [0,1], when p cls is greater than or equal to 0.5, the four-tuple label is considered to be 1, otherwise, 0, and the four-tuple conflict classification of the example is 1, namely, the non-conflicting four-tuple.

Description

Time constraint and multi-task learning-based time sequence knowledge spectrum conflict detection Technical Field The invention relates to a time sequence knowledge graph conflict detection method based on time constraint and multi-task learning, and belongs to the fields of knowledge graph construction and natural language processing. Background The time sequence knowledge graph conflict detection is an important research topic in the field of time sequence knowledge graph construction and application. The time sequence knowledge graph conflict detection refers to identifying a quadruple inconsistent with the real world in the existing time sequence knowledge graph, and the quadruple is in the form of (entity, relation, entity and timestamp) and comprises four-element entities, relation, entity and timestamp. The time sequence knowledge graph error detection method comprises an error detection method based on time constraint and an error detection method based on knowledge representation learning. For error detection methods based on time constraints, constraint rules need to be constructed first, and then time conflicting quaternions are detected based on whether the time constraints are violated. For example, in the uncertain time-series knowledge-graph error detection method based on the markov logic network, first, a time constraint rule and an automatic mining rule are manually constructed, then an uncertain time-series knowledge-graph is modeled based on the markov logic network, and error facts in the time-series knowledge-graph are detected by using the rule and the data log. For another example, the time constraint mining method based on the structural mode carries out time sequence knowledge graph error detection, firstly, the time constraint is divided into four-element facts and predicate constraints, and the structural mode of the time constraint is designed. And secondly, generating a sub-graph mode from the knowledge graph through a structural mode, and generating candidate constraints by attaching a time predicate to corresponding time information. Finally, the constraint quality is calculated by the entity and the time conflict is detected using the high quality constraint. The error detection method based on knowledge representation learning comprises the steps of firstly obtaining vector representations of entities, relations and time stamps in a time sequence knowledge graph, and then evaluating the errors of the quaternions by utilizing a quaternion scoring function. For example, a time series knowledge pattern error detection method based on static embedding first divides time stamp information into adults, months and days, and encodes the time information with a digital sequence to obtain a time series. And secondly, inputting the time sequence and predicate relation into a long-short-term memory network, and learning an output sequence of the long-short-term memory network by using a recurrent neural network to obtain the relation embedding containing time information. Finally, a scoring function is used to evaluate the quality of the quadruple. For another example, a timing knowledge graph error detection method based on semantic embedding and path embedding first constructs three time constraints to detect a time conflict. Second, confidence between entities is evaluated based on semantic embedding and path embedding. Finally, the entities in the conflicting tetrads are replaced by tail entity link prediction tasks to resolve the conflicts. The existing time sequence knowledge graph error detection method mainly has the following problems that firstly, the existing knowledge representation learning-based method is difficult to fully utilize time constraint information, time stamp information and predicate relation are bound, and the characteristic of the evolution of entity semantic features along with time is ignored. Secondly, when learning the quadruple, the time sequence knowledge map conflict detection method based on knowledge representation learning assumes that the quadruple in the data set is correct, and ignores the influence of noise data on the performance of a conflict detection model. Disclosure of Invention The invention aims to solve the problem that the performance of the conventional time sequence knowledge spectrum conflict detection method is low due to insufficient time information utilization and difficulty in accurately learning quadruple embedding from a noisy data set, and provides a time sequence knowledge spectrum conflict detection method based on time constraint and multi-task learning, which can effectively improve the quality of a time sequence knowledge spectrum. The basic idea of the method comprises the steps of firstly constructing predicate constraint rules according to constraint information between domain knowledge and predicate relations, filtering out quadruples inconsistent with the domain knowledge and time, secondly obtaining entity embed