Search

CN-122021833-A - End-side knowledge graph construction and maintenance method and system based on small language model

CN122021833ACN 122021833 ACN122021833 ACN 122021833ACN-122021833-A

Abstract

The invention discloses a method and a system for constructing and maintaining an end-side knowledge graph based on a small language model, wherein the method comprises the following steps of S11, obtaining a text block, S12, extracting an entity information list to be updated of the text block, S13, extracting relation information to be updated according to the text block, the entity information list to be updated and a prompt word, S14, extracting iteration relation information between an iteration entity information list and an entity pair according to the text block, the entity information list to be updated and the relation information to be updated by adopting the small language model, S15, updating the iteration entity information list to the entity information list to be updated and the iteration relation information to be updated until the iteration number is equal to a preset value, and S14. The entity relationship extraction is realized based on the verification priority strategy, the deep and hidden relationship in the high-density text is more easily captured by the model, the illusion of a small model is reduced, and the accuracy of relationship information extraction is improved.

Inventors

  • YAN KE
  • MU LI
  • GAO LEI
  • Yue Dongcun
  • Ge tianyu

Assignees

  • 电子科技大学

Dates

Publication Date
20260512
Application Date
20260302

Claims (10)

  1. 1. The end-side knowledge graph construction and maintenance method based on the small-sized language model is characterized by comprising the following steps of: S11, acquiring at least one text block in the same technical field; S12, extracting an entity information list to be updated from the text block by adopting a small language model, wherein at least one entity information of an entity in the entity information list to be updated is obtained, and each entity information at least comprises at least two elements; S13, extracting relation information to be updated between each entity pair by adopting a small language model according to the text block, an entity information list to be updated and a prompt word constructed based on the entity information, wherein the entity pair comprises two entities, and the iteration number is set to be 0; S14, extracting iteration relation information between the iteration entity information list and the entity pair by adopting a small language model according to the text block, the entity information list to be updated and the relation information to be updated to increase the iteration times by one; And S15, respectively updating the iteration entity information list into an entity information list to be updated and updating the iteration relation information into relation information to be updated in response to the iteration times smaller than a preset value, and then continuing to execute the step S14 until the iteration times are equal to the preset value, so as to obtain final relation information between a final entity information list and entity pairs.
  2. 2. The method for constructing and maintaining an end-side knowledge graph based on a small language model according to claim 1, wherein the elements in the entity information include a type identifier, an entity name, an entity category and an entity description; The obtaining the final relationship information between the final entity and the entity pair further comprises: extracting a first element of the entity information and judging whether the first element is a type identifier of an entity according to the first element; responding to the type identifier of the first element as an entity, and judging the number of the elements in the entity information; And deleting the entity information in response to the number of elements in the entity information being greater than 4 or less than 4.
  3. 3. The method for constructing and maintaining an end-side knowledge graph based on a small language model according to claim 1, wherein the elements in the final relationship information include type identifiers, source entity names, target entity names, relationship descriptions and keywords of the relationship; The obtaining the final relationship information between the final entity and the entity pair further comprises: Extracting a first element of the final relationship information and judging whether the first element is a type identifier of the relationship information according to the first element; Responding to the type identifier of the first element which is the relation information, and judging the number of the elements in the final relation information; and deleting the final relation information in response to the number of elements in the final relation information being greater than 5 or less than 5.
  4. 4. The method for constructing and maintaining an end-side knowledge graph based on a small language model according to claim 1, wherein the elements in the entity information include a type identifier, an entity name, an entity category and an entity description; the step of obtaining the final relationship information between the final entity information list and the entity pair further comprises the following steps: Performing capitalization conversion on entity names and entity types in the entity information, and packaging the entity names and the entity types, the entity descriptions and text block numbers corresponding to the entities into an entity dictionary; Merging entity dictionaries with the same entity name in the entity dictionaries to obtain an entity to be written , wherein, For the name of the entity, Is in combination with All of the physical dictionaries that are relevant, Comprising Wherein n is the number of entity dictionaries of the same entity name; entity to be written And (3) performing disambiguation processing on the entity names in the file.
  5. 5. The method for constructing and maintaining an end-side knowledge graph based on a small-scale language model as claimed in claim 4, wherein said entity to be written is Disambiguation of the entity name, comprising: for the ith entity to be written Vectorizing the entity names in the database to obtain entity name vectors ; Calculating arbitrary two entity name vectors Similarity of (2); responsive to the similarity being greater than a first threshold, for two corresponding entities to be written Combining; in response to the similarity being greater than or equal to the second threshold and less than or equal to the first threshold, corresponding two entities to be written Inputting a small language model to judge whether the small language model is the same entity, wherein the first threshold is larger than the second threshold; responding to the small language model to judge that the small language model is the same entity, and writing two corresponding entities to be written And combining.
  6. 6. The method for constructing and maintaining an end-side knowledge graph based on a small language model according to claim 4, wherein the final relationship information includes a type identifier, a source entity name, a target entity name, a relationship description, and keywords of a relationship; the step of obtaining the final relationship information between the final entity information list and the entity pair further comprises the following steps: Performing capitalization conversion on a source entity name and a target entity name in the final relationship information, and then packaging the source entity name and the target entity name, the relationship description, the keywords of the relationship and the text block numbers corresponding to the final relationship information into a relationship information dictionary; the method comprises the steps of performing undirected normalization processing on source entity names and target entity names in the relation information dictionary, and merging the relation information dictionary with the same source entity names and target entity names in the relation information dictionary to obtain a relation to be written , wherein, For source entity name and target entity name tuples, Is in combination with A dictionary of all of the relational information related, Comprising Wherein The number of the relationship information dictionaries for the same source entity name and target entity name.
  7. 7. The method for constructing and maintaining an end-side knowledge graph based on a small-scale language model as claimed in claim 6, wherein said obtaining the relationship to be written And then further comprises: According to the entity to be written Querying the knowledge graph by the entity name, and extracting the entity category and the entity description corresponding to the entity name in the knowledge graph; Counting entities to be written And the occurrence frequency of each entity type in the historical entity information, and taking the entity type with the highest frequency as the entity type to be updated of the entity name; Extracting entities to be written The entity description in the knowledge graph is spliced with the entity description corresponding to the entity name in the knowledge graph to obtain spliced entity description; Judging the length of the spliced entity description, and inputting the spliced entity description into a small language model for abstract fusion to obtain entity description to be updated, wherein the length of the entity description is smaller than or equal to a second length threshold value in response to the length of the spliced entity description being larger than the second length threshold value; and packaging the entity name, the entity type to be updated and the entity description to be updated, and then writing the packaged entity name, the entity type to be updated and the entity description to be updated into a knowledge graph.
  8. 8. The method for constructing and maintaining an end-side knowledge graph based on a small-scale language model as claimed in claim 6, wherein said obtaining the relationship to be written And then further comprises: Extracting relation to be written Checking whether the side from the source entity name to the target entity name exists in the knowledge graph, and if the side from the source entity name to the target entity name exists in the knowledge graph, extracting the relation description, the key word of the relation and the text block number of the side from the knowledge graph; relation to be written The keywords of the relation in the relationship and the keywords of the relationship corresponding to the edge in the knowledge graph are subjected to duplication removal and then are connected through separator, so that the keywords to be updated are obtained; relation to be written The text block numbers in the knowledge graph and the text block numbers corresponding to the edges in the knowledge graph are connected through separator after duplicate removal, so that the text block numbers to be updated are obtained; relation to be written The relationship description in the database and the relationship description corresponding to the side in the knowledge graph are spliced to obtain the relationship description to be updated; and packaging the source entity name, the target entity name, the keywords to be updated, the text block numbers to be updated and the relation description to be updated, and writing the packaged text block numbers to be updated into a knowledge graph.
  9. 9. The method for constructing and maintaining an end-side knowledge graph based on a small-scale language model according to claim 8, wherein the obtaining the description of the relationship to be updated further comprises: And judging the length of the relation description to be updated, and inputting the relation description to be updated into a small language model for abstract fusion in response to the length of the relation description to be updated being greater than a third length threshold value to obtain the relation description to be updated with the length being less than or equal to the third length threshold value.
  10. 10. An end-side knowledge graph constructing and maintaining system based on a small language model comprises a memory and a controller which are sequentially connected in communication, wherein the memory stores a computer program, the controller is configured to read the computer program and execute the method for constructing and maintaining an end-side knowledge graph based on a small-sized language model according to any one of claims 1 to 9.

Description

End-side knowledge graph construction and maintenance method and system based on small language model Technical Field The invention belongs to the technical field of knowledge graph construction and maintenance, and particularly relates to an end-side knowledge graph construction and maintenance method and system based on a small-sized language model. Background In the field of knowledge graph enhanced search enhancement generation (KG-RAG), the main stream frames such as GraphRAG, lightRAG realize high-quality knowledge base construction through the joint extraction capability of large-scale language models (Large Language Model, LLM). However, for deployment requirements of edge-side or end-side devices, a small language model (Small Language Models, SLM) with parameters less than 10B is difficult to directly adapt to complex extraction frameworks due to poor instruction compliance. In this context, a lightweight system MiniRAG is proposed, whose core logic is to trade off extreme speed and low cost of the build phase by sacrificing storage space and denoising pressure during retrieval, which shows a certain application potential in resource constrained scenarios. However, miniRAG has significant defects in the construction accuracy of the knowledge graph, and severely restricts the robustness and knowledge consistency of the system. In the knowledge graph construction stage, in the face of a long text block with high information density, single reasoning often occurs due to the limitation of semantic understanding capability of the SLM, so that information omission and illusion occur, and entity relations conforming to extraction conditions are omitted or wrong information is added to the entity relations. Disclosure of Invention The invention provides a method and a system for constructing and maintaining an end-side knowledge graph based on a small language model, which aim to solve the problems of information omission and illusion in long context processing in the existing method. The aim of the invention is realized by the following technical scheme: The first aspect of the invention discloses a method for constructing and maintaining an end-side knowledge graph based on a small language model, which comprises the following steps: S11, acquiring at least one text block in the same technical field; s12, extracting an entity information list to be updated from the text block by adopting a small language model, wherein at least one entity information of an entity in the entity information list to be updated is obtained, and each entity information at least comprises at least two elements; The entity information list comprises at least two entities to be updated and entity information corresponding to each entity to be updated; S13, extracting relation information to be updated between each entity pair by adopting a small language model according to the text block, an entity information list to be updated and a prompt word constructed based on the entity information, wherein the entity pair comprises two entities, and the iteration number is set to be 0; S14, extracting iteration relation information between the iteration entity information list and the entity pair by adopting a small language model according to the text block, the entity information list to be updated and the relation information to be updated to increase the iteration times by one; And S15, respectively updating the iteration entity information list into an entity information list to be updated and updating the iteration relation information into relation information to be updated in response to the iteration times smaller than a preset value, and then continuing to execute the step S14 until the iteration times are equal to the preset value, so as to obtain final relation information between a final entity information list and entity pairs. The invention discloses a small-sized language model-based end-side knowledge graph construction and maintenance system, which comprises a memory and a controller which are sequentially in communication connection, wherein the memory is stored with a computer program, and the controller is used for reading the computer program and executing the small-sized language model-based end-side knowledge graph construction and maintenance method in the first aspect. The beneficial effects of the invention are as follows: The method is more suitable for the capability boundary of a small language model, and in the same graph index construction task, the method remarkably improves the detail capturing capability of the SLM in entity description, reduces the noise in an index graph during retrieval, and effectively relieves the information omission and illusion problems of the SLM in long context processing. Drawings In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior