CN-121981087-A - Content generation method combining knowledge graph and large language model

CN121981087ACN 121981087 ACN121981087 ACN 121981087ACN-121981087-A

Abstract

The invention relates to a content generation method combining a knowledge graph and a large language model. The method comprises the steps of extracting entity, relation and attribute triples of multi-source corpus based on multi-type target content data, dividing topic clusters, constructing a knowledge map, standardizing and packaging a knowledge unit to obtain a standardized knowledge set, analyzing user input requirements, obtaining multi-dimensional requirement features, matching the multi-dimensional requirement features with the topic clusters to obtain target clusters, screening candidate knowledge through multi-hop retrieval, grading by three-dimensional evaluation index weighting, obtaining associated knowledge subsets, converting the associated knowledge subsets into resolvable structured texts, optimizing model generation parameters to obtain initial generated contents, comparing the initial contents with the knowledge map to obtain results, combining the multi-type target content data and audience features, optimizing text features, adapting and adjusting to obtain standard contents of target quality. And realizing the content generation combining the knowledge graph and the large language model.

Inventors

JIN CHONGYING
LI WEI
Heng jing
LIU SHUHUA

Assignees

上海数熙科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260122

Claims (12)

1. A content generation method combining a knowledge graph and a large language model, comprising: The method comprises the steps of S1, based on multi-type target content data, obtaining a triplet of entities, relations and attributes of multi-source corpus, dividing a topic cluster, constructing a knowledge graph, and carrying out standardized packaging on a knowledge unit to obtain a standardized knowledge set; S2, converting content input by a user into corresponding demands based on the standardized knowledge set, carrying out semantic analysis to obtain multi-dimensional demand characteristics, carrying out similarity calculation on the multi-dimensional demand characteristics serving as matching basis and the subject clusters to obtain an adapted target subject cluster, carrying out multi-hop knowledge retrieval in the target subject cluster to obtain candidate knowledge, introducing three-dimensional evaluation indexes of demand correlation, knowledge reliability and type adaptation degree, obtaining corresponding index weights, and carrying out comprehensive scoring; s3, converting based on the associated knowledge subset to obtain a resolvable structured text, injecting a format rule and a term specification into the target content type, adapting to the generation parameters of the content type tuning model, and obtaining initial generation content; and S4, comparing the initial generated content with the knowledge graph to obtain a comparison result, optimizing text characteristics and performing adaptation adjustment according to the multi-type target content data and audience characteristics to obtain standard content of target quality.
2. The method of claim 1, wherein the specific process of obtaining the triples of entities, relations and attributes of the multi-source corpus comprises the steps of performing word segmentation and semantic annotation on the multi-source corpus corresponding to the multi-type target content data by adopting a mixed model, identifying the entities in the corpus by combining remote supervision and manual verification, and mining semantic relations among the entities to obtain quantized attributes and descriptive attributes of the entities, and obtaining the triples of the structured entities, relations and attributes.
3. The method of claim 1, wherein the dividing the topic clusters and constructing the knowledge graph comprises calculating the association strength between entities by using a Louvain community discovery algorithm based on the triples, dividing the topic clusters by a preset association strength threshold, constructing the domain knowledge graph by taking the entities in the triples as nodes, the relationship as edges, and the supplementary entity attribute as the node attribute, and simultaneously constructing the mapping relationship between the topic clusters and the knowledge graph nodes.
4. The method of claim 1, wherein the standardized packaging of the knowledge units comprises combining node-edge combinations in the knowledge graph as independent knowledge units, extracting semantic summaries of the independent knowledge units, labeling type tags of the adaptive target content, quantifying credibility scores of the independent knowledge units, and packaging through a unified format to obtain a standardized knowledge set.
5. The method of claim 1, wherein the converting the content input by the user into the corresponding requirement and performing semantic analysis comprises obtaining natural language requirement input by the user through a large language model and performing intention recognition, extracting corresponding structure information, and integrating the corresponding structure information to obtain a multidimensional requirement feature vector.
6. The method of claim 1, wherein the performing similarity calculation with the topic cluster includes calculating semantic matching degrees of semantic tags of the topic cluster and the multidimensional demand feature vector by adopting a cosine similarity algorithm, and obtaining the topic cluster with matching degrees and coverage demands by presetting a similarity threshold.
7. The method of claim 1, wherein the multi-hop knowledge retrieval comprises performing multi-hop knowledge retrieval using a two-way breadth-first search algorithm, traversing entities, relationships, and attributes associated with the target, using the knowledge units as candidate knowledge, and obtaining a retrieval result.
8. The method of claim 1, wherein the obtaining the corresponding index weight and the comprehensively scoring include weighting three-dimensional evaluation indexes of the demand correlation, the knowledge credibility and the type adaptation degree by adopting an entropy weight method, comprehensively scoring the candidate knowledge, and obtaining a quantized scoring result.
9. The method of claim 1, wherein the obtaining the resolvable structured text comprises organizing the associated knowledge subsets according to corresponding hierarchies, converting standardized packaging information of the knowledge units into natural language descriptions understandable by a large language model, supplementing semantic associations among the knowledge units, and obtaining the structured text input by the model.
10. The method of claim 1, wherein the obtaining the initial generation content comprises obtaining injection format rules and term specifications of the structured text based on format specifications of the multi-type target content data, adapting corresponding values of a optimized large language model based on the target content type, calling the large language model to perform incremental generation on the structured text, and obtaining initial generation content meeting format and term requirements.
11. The method of claim 1, wherein the obtaining the comparison result includes extracting the entity, the relation and the attribute of the initially generated content to obtain data to be checked, and comparing the data with corresponding node, side and attribute information in the knowledge graph to generate the comparison result.
12. The method of claim 1, wherein the adapting comprises correcting error information in the initially generated content based on the comparison result, supplementing missing semantic association and reasoning links in combination with the knowledge graph, optimizing language expression of texts according to style requirements and audience characteristics of the multi-type target content data, and performing typesetting adaptation according to a conventional format of target content to obtain standard content with target quality.

Description

Content generation method combining knowledge graph and large language model Technical Field The invention belongs to the technical field of intelligent content generation combining a knowledge graph and a large language model, and particularly relates to a content generation method combining the knowledge graph and the large language model. Background There are still the following areas to be improved in terms of content generation: With the rapid development of artificial intelligence technology, the content generation technology based on a large language model is widely applied to various text creation scenes, so that the content production efficiency is greatly improved. However, there are still many problems to be solved in the prior art in practical application: The prior method is mainly used for generating the content directly based on the natural language requirement of a user, lacks of deep analysis and scene adaptation of the requirement, cannot accurately extract multi-dimensional requirement information such as core intention, audience characteristics, space constraints and the like, causes larger deviation between the generated content and actual requirement, is difficult to adapt to the exclusive requirement of different types of content (such as the split mirror logic of a video script and the formal of a document), is difficult to adapt to the exclusive requirement of different types of content (such as the formalism of the video script), is fragmented for storing knowledge by the large language model, lacks of structural organization, is difficult to have the problems of knowledge piling, logic faults and the like when the content is generated, and is difficult to form high-quality text with clear hierarchy and consistent reasoning; The prior method does not carry out standardized encapsulation and structural management on the domain knowledge, the model is required to be recalled for indiscriminate calculation every time the content is generated, the efficient multiplexing of the knowledge cannot be realized, and when the domain knowledge is updated, the model is required to be retrained, so that the iteration cost is extremely high. These problems cause that the existing content generation technology has short boards in aspects of professional, pertinence, logic and the like, and is difficult to meet the high-quality generation requirement of multi-type target content, so that the deep application of the technology in the vertical field is limited. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a content generation method combining a knowledge graph and a large language model; The aim of the invention can be achieved by the following technical scheme: The method comprises the steps of S1, based on multi-type target content data, obtaining a triplet of entities, relations and attributes of multi-source corpus, dividing a topic cluster, constructing a knowledge graph, and carrying out standardized packaging on a knowledge unit to obtain a standardized knowledge set; S2, converting content input by a user into corresponding demands based on the standardized knowledge set, carrying out semantic analysis to obtain multi-dimensional demand characteristics, carrying out similarity calculation on the multi-dimensional demand characteristics serving as matching basis and the subject clusters to obtain an adapted target subject cluster, carrying out multi-hop knowledge retrieval in the target subject cluster to obtain candidate knowledge, introducing three-dimensional evaluation indexes of demand correlation, knowledge reliability and type adaptation degree, obtaining corresponding index weights, and carrying out comprehensive scoring; s3, converting based on the associated knowledge subset to obtain a resolvable structured text, injecting a format rule and a term specification into the target content type, adapting to the generation parameters of the content type tuning model, and obtaining initial generation content; and S4, comparing the initial generated content with the knowledge graph to obtain a comparison result, optimizing text characteristics and performing adaptation adjustment according to the multi-type target content data and audience characteristics to obtain standard content of target quality. The method comprises the steps of adopting a mixed model to segment and annotate multi-source corpuses corresponding to multi-type target content data, identifying the entities in the corpuses in a remote supervision and manual verification combined mode and mining semantic relations among the entities, and obtaining quantized attributes and descriptive attributes of the entities to obtain the structured triples of the entities, the relations and the attributes. The method comprises the steps of dividing topic clusters and constructing a knowledge graph, namely calculating the association strength among entities based on the triples by using a