CN-115618006-B - Automatic knowledge graph construction system and working method thereof

CN115618006BCN 115618006 BCN115618006 BCN 115618006BCN-115618006-B

Abstract

The invention discloses an automatic knowledge graph construction system and a working method, comprising an information acquisition module, a schema generation module, an information extraction module, an information fusion module and a pre-training language model, wherein the schema generation module and the information extraction module are used for expressing natural language texts by a deep learning model of a prompt normal form template, the information acquisition module is used for acquiring unstructured text corpus and corresponding schema information thereof, including but not limited to unlabeled internet text corpus, the pre-training language model is arranged in a server, the text corpus is subjected to self-supervision training by a deep learning architecture training mode based on a converter-decoder, and then fed back to the schema generation module and the information extraction module after corresponding fine adjustment, so that training accuracy is improved.

Inventors

YIN CHAO

Assignees

上海穰川信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20220826

Claims (7)

1. An automatic knowledge graph construction system is characterized by comprising an information acquisition module, a schema generation module, The system comprises an information extraction module, an information fusion module and a pre-training language model, wherein the scheme generation module and the information extraction module express natural language texts in a deep learning model of prompt paradigm prompt, and the information acquisition module is used for acquiring unstructured text corpus and corresponding scheme information thereof, including but not limited to unlabeled internet text corpus; The scheme generation module consists of an encoder and a decoder which are the same as a pre-training language model, and is used for generating scheme fragments expressed in sequence and corresponding to a designated part of the SELECT; The information extraction module is used for extracting natural language corpus so as to output the natural language corpus as structured information, splicing unstructured texts and the corresponding schema information, taking the spliced unstructured texts and the corresponding schema information as input, and taking specific data values corresponding to the texts and the schema as output, wherein the structured information is expressed in an SQL mode; The pre-training language model is arranged in a server, self-supervision training is carried out on the text corpus based on a deep learning architecture training mode of an encoder-decoder of a transducer, and then the text corpus is fed back to the schema generation module and the information extraction module after corresponding fine tuning, so that training precision is improved; The information fusion module is used for verifying the information credibility when fragments of the structured data are fused to the knowledge graph, wherein the structured data comprise part of schema information extracted from a single unstructured text.
2. The automatic knowledge-graph construction system according to claim 1, wherein the pattern generation module has a loss function between encoder and decoder expressed as: ; Wherein, the , Parameters of the encoder and decoder, respectively; the method comprises the steps of obtaining a metadata, wherein the metadata is an original text, meta SQL is meta information expressed in SQL language, and schema is schema information to be output.
3. The automatic knowledge graph construction system according to claim 1, wherein the loss function of the information extraction module is: ; Wherein, the , Parameters of an encoder and a decoder respectively, x is an original text, SCHEMASQL is schema information expressed in SQL language, and data is structured data information to be output.
4. The automated knowledge-graph construction system of claim 1, wherein the loss function of the pre-trained language model is: ; Wherein, the , Parameters of the encoder and decoder respectively, x is the input text, For the text to be predicted, if a mask language model is used, I.e. the obscured text, and if a causal language model is used, I.e. the next word, the next fragment or the next sentence.
5. The system for automatically constructing a knowledge graph according to claim 1, wherein the information fusion module checks a value of a certain attribute field key in the schema as follows: ; Wherein, the Representing the credibility of source j, calculated by the pagerank method, k=1..n represents that value has N values, Representing the value of k, the same extracted from different sources j Corresponding to different weight values, accumulating the weight values, and taking out the corresponding value with the highest score Namely the value corresponding to the key.
6. The working method of an automatic knowledge graph construction system according to any one of claims 1 to 5, comprising the steps of: s1, performing a pre-training language model in a self-supervision mode enumerated by the pre-training language model, wherein unstructured text corpus comprises various unlabeled internet text corpora; s2, collecting unstructured texts and corresponding schema information thereof as training data of a schema generation module, searching texts containing structured values through structured knowledge, and taking the texts and the schema of the structured knowledge as the training data; S3, splicing unstructured texts and the corresponding schema information, using the spliced unstructured texts and the corresponding schema information as input, and using specific data values corresponding to the texts and the schema as output, and training an information extraction module; S4, obtaining a schema generating module through the S2, and after obtaining an information extracting module through the S3, carrying out complete information extraction on a new document, wherein the schema generating module is used for obtaining schema information firstly, and then obtaining structured data information according to texts and the schema information; And S5, after the structured data information of any source is obtained through the S4, checking and fusing the structured data by adopting a knowledge fusion mode enumerated by the information fusion module, supplementing the checked original text and the structured data into training data, and continuously training by a training method of the schema generation module and the information extraction module.
7. The method of claim 6, wherein the training data used by the information extraction module is the training data used by the information extraction module, and the method of step S2 requires storing the original text, the structured schema and the structured data at the same time.

Description

Automatic knowledge graph construction system and working method thereof Technical Field The invention belongs to the technical field of artificial intelligence knowledge graphs, and particularly relates to an automatic knowledge graph construction system and a working method thereof. Background Knowledge graph refers to representing data by a graph structure formed by entities (nodes) -relations (edges) -entities (nodes). Where nodes and relationships can both define various attributes to express complex data structures, such as events expressed in terms of time, place, subject, behavior. Because of the special graph structure organization mode of the knowledge graph, the method can effectively improve the efficiency in the aspects of data processing, searching, statistical analysis, modeling and the like of the relation, and has wide application. The traditional knowledge graph construction mode generally comprises two large steps of defining a data structure (schema) of an entity and a relation according to a service, manually marking or aligning unstructured or semi-structured data according to the designated schema, constructing an information extraction module and extracting actual data. The two steps are dependent on a manual mode, so that on one hand, the construction cost is high, and on the other hand, the field expansion and automatic updating face great challenges. The main method for constructing the knowledge graph comprises the steps of manually defining and utilizing the knowledge organization structure of the encyclopedia page and a preset mode, such as an is-a type conceptual graph, and the main modes of the information extraction step comprise a lexical mode-based method, a clustering-based method and an information extraction mode based on the instruction of the schema. The method comprises the steps of manually defining and manually assigning modes based on the lexical modes, wherein complicated and changeable text expression in reality is difficult to deal with, the problem of cold start can be effectively solved by utilizing encyclopedic knowledge organization, the coverage of structured knowledge is very small, information in unstructured texts can be mined by matching with other modes, extraction of information based on schema guidance is a deep learning model method, and the solution of a task of question and answer is referred to by using the expression of schema as a problem and text content to be extracted as background information, and the answer of the problem is extracted structured data. However, the information extraction based on the schema guidance in the present stage depends on manual given schema, and the problem that the background information on which the automatic construction and extraction of the schema depends is limited to the current text and the constructed knowledge graph cannot be effectively utilized is not solved. Disclosure of Invention In order to solve the defects in the prior art, the invention provides an automatic knowledge graph construction system and a working method thereof. In order to solve the technical problems, the invention provides the following technical scheme: the automatic knowledge graph construction system comprises an information acquisition module, a schema generation module, an information extraction module, an information fusion module and a pre-training language model, wherein the schema generation module and the information extraction module express natural language texts in a deep learning model of a prompt template, and the information acquisition module is used for acquiring unstructured text corpus and corresponding schema information thereof, including but not limited to unlabeled internet text corpus; The scheme generation module consists of an encoder and a decoder which are the same as a pre-training language model, and is used for generating scheme fragments expressed in sequence and corresponding to a designated part of the SELECT; The information extraction module is used for extracting natural language corpus so as to output the natural language corpus as structured information, splicing unstructured texts and the corresponding schema information, taking the spliced unstructured texts and the corresponding schema information as input, and taking specific data values corresponding to the texts and the schema as output, wherein the structured information is expressed in an SQL mode; The pre-training language model is arranged in a server, self-supervision training is carried out on the text corpus based on a deep learning architecture training mode of an encoder-decoder of a transducer, and then the text corpus is fed back to the schema generation module and the information extraction module after corresponding fine tuning, so that training precision is improved; The information fusion module is used for verifying the information credibility when fragments of the structured data are fused to the knowledge graph, where