CN-121979966-A - Information processing method, information processing apparatus, computer device, computer readable storage medium, and computer program product

CN121979966ACN 121979966 ACN121979966 ACN 121979966ACN-121979966-A

Abstract

The application provides an information processing method, an information processing device, computer equipment, a computer readable storage medium and a computer program product, wherein the method comprises the steps of respectively carrying out semantic understanding on a first text in a knowledge base and paragraph units in the first text, obtaining a text abstract corresponding to the first text, paragraph summaries corresponding to the paragraph units and triples of the paragraph units, constructing a local map index corresponding to the first text based on the triples of the paragraph units in the first text, carrying out clustering processing on the local map indexes corresponding to the texts, carrying out community division on the texts according to clustering processing results, generating a global map index, searching the global map index based on user problems, obtaining one or more first search results related to the user problems, searching the text abstract and the paragraph abstract in the one or more first search results, obtaining one or more second search results, and generating target results corresponding to the user problems.

Inventors

LIU HOUKAI
ZHANG JINGBO
PAN YUXUAN
ZHANG YONGXI

Assignees

中移(苏州)软件技术有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. An information processing method, characterized in that the method comprises: Respectively carrying out semantic understanding on a first text in a knowledge base and each paragraph unit in the first text, obtaining a text abstract corresponding to the first text and a paragraph abstract corresponding to each paragraph unit in the first text, and obtaining a triplet of each paragraph unit in the first text, wherein the triplet comprises a head entity, a relation and a tail entity, and the first text is any text in a plurality of texts included in the knowledge base; Constructing a local map index corresponding to the first text based on triples of paragraph units in the first text, clustering the local map indexes corresponding to the texts, and dividing communities of the texts according to clustering results to generate a global map index, wherein nodes in the local map index correspond to head entities or tail entities in the triples of paragraph units in the first text, and edges between two nodes in the local map index represent association relations of the two nodes; Retrieving the global map index based on a user question to obtain one or more first retrieval results related to the user question, wherein the first retrieval results comprise a text abstract, a paragraph abstract and a triplet associated with a second text, and the second text is a text related to the user question in the plurality of texts; And searching the text abstract and the paragraph abstract in the one or more first search results based on the user problem to obtain one or more second search results, and generating a target result corresponding to the user problem based on the one or more second search results, wherein the second search results comprise triples and paragraph summaries corresponding to one or more paragraph units in the second text.
2. The method of claim 1, wherein the local map index stores a text summary of the first text, paragraph summaries corresponding to paragraph units in the first text, and a first set of entities associated with the first text, the first set of entities including a head entity and a tail entity of each triplet associated with the first text; each community in the global graph index correspondingly stores a second entity set, wherein the second entity set comprises a first entity set of each first text associated with each node in the at least part of nodes in the community; The retrieving the global map index based on the user question, obtaining one or more first retrieval results related to the user question, includes: Obtaining first entity information and first non-entity information in the user question, calculating similarity of the first entity information and a second entity set associated with a community, determining one or more second texts related to the user question based on the similarity, and determining one or more first retrieval results associated with the one or more second texts.
3. The method of claim 2, wherein the computing the similarity of the first entity information to a second set of entities associated with a community, determining one or more second text related to the user question based on the similarity, comprises: calculating the similarity of the first entity information and a second entity set associated with each community, and determining one or more first communities with the similarity greater than or equal to a first threshold value; Calculating the similarity of the first entity information and entity sets associated with each ith-1 level node in the first community, determining one or more ith-2 level nodes under the ith-1 level node with the similarity being greater than or equal to a first threshold, calculating the similarity of the first entity information and the entity sets associated with the one or more ith-2 level nodes, and determining one or more second texts from one or more texts associated with each first community based on the similarity in the case that i is equal to 3; Wherein i is a positive integer greater than or equal to 3 and less than or equal to N, N is the total number of layers of the first community, when i is equal to 3, the i-2 level node corresponds to a local map index, and when i is equal to N, the entity set associated with the i-2 level node is a second entity set.
4. The method of claim 2, wherein retrieving the text excerpt and the paragraph excerpt from the one or more first retrieval results based on the user question, obtaining one or more second retrieval results, comprises: calculating the relevance between the first entity information and the first non-entity information and text abstracts and/or paragraph abstracts in the one or more first retrieval results by using a relevance model, and determining one or more second retrieval results from the one or more first retrieval results based on the relevance; The correlation model is related to a first weight corresponding to the first entity information and a second weight corresponding to the first non-entity information, the first weight is determined based on the occurrence frequency of the first entity information in the average statement units of the abstracts to be matched, and the second weight is determined based on the occurrence frequency of the first non-entity information in the average statement units of the abstracts to be matched.
5. The method of claim 4, wherein the calculating, using the relevance model, a relevance of the first entity information and the first non-entity information to a text excerpt and/or a paragraph excerpt, respectively, of the one or more first search results, determining, based on the relevance, one or more second search results from the one or more first search results, comprises: Calculating a first relevance of the first entity information and the text summaries associated with the one or more second texts by using the relevance model, and calculating a second relevance of the first non-entity information and the text summaries associated with the one or more second texts, and determining one or more third texts, of which the sum of the first relevance and the second relevance is greater than or equal to a second threshold, from the one or more second texts based on the first relevance and the second relevance; Calculating a third relevance of the first entity information and the paragraph summaries associated with the one or more third texts by using the relevance model, and calculating a fourth relevance of the first non-entity information and the paragraph summaries associated with the one or more third texts, and determining one or more paragraph units, based on the third relevance and the fourth relevance, from paragraph units included in each third text, wherein the sum of the third relevance and the fourth relevance is greater than or equal to a third threshold; and obtaining the second retrieval result according to the triples and paragraph abstracts corresponding to the one or more paragraph units.
6. The method of claim 1, wherein the semantic understanding of the first text in the knowledge base and the paragraph units in the first text, respectively, obtaining a text abstract corresponding to the first text and a paragraph abstract corresponding to the paragraph units in the first text, and obtaining a triplet of the paragraph units in the first text, respectively, comprises: acquiring first template information related to a text abstract, and carrying out semantic understanding processing on the first template information and the first text based on a large language model to acquire the text abstract corresponding to the first text; Acquiring second template information related to paragraph summaries, carrying out semantic understanding processing on the second template information and each paragraph unit in the first text based on a large language model, and acquiring paragraph summaries corresponding to each paragraph unit in the first text; and acquiring third template information related to the triples, and processing the third template information and each paragraph unit in the first text based on a large language model to acquire the triples of each paragraph unit in the first text.
7. An information processing apparatus is characterized by comprising an acquisition unit, a construction unit and a retrieval unit, wherein, The obtaining unit is configured to perform semantic understanding on a first text in a knowledge base and each paragraph unit in the first text, obtain a text abstract corresponding to the first text and a paragraph abstract corresponding to each paragraph unit in the first text, and obtain a triplet of each paragraph unit in the first text, where the triplet includes a head entity, a relationship, and a tail entity, and the first text is any text in a plurality of texts included in the knowledge base; The construction unit is used for constructing a local map index corresponding to the first text based on the triples of paragraph units in the first text, clustering the local map indexes corresponding to the texts respectively, and dividing communities of the texts according to clustering results to generate a global map index, wherein nodes in the local map index correspond to head entities or tail entities in the triples of paragraph units in the first text, and edges between two nodes in the local map index represent association relations of the two nodes; The retrieval unit is used for retrieving the global map index based on a user problem to obtain one or more first retrieval results related to the user problem, wherein the first retrieval results comprise text summaries, paragraph summaries and triples related to second texts, the second texts are texts related to the user problem, the retrieval unit is also used for retrieving the text summaries and the paragraph summaries in the one or more first retrieval results based on the user problem to obtain one or more second retrieval results, and generating target results corresponding to the user problem based on the one or more second retrieval results, and the second retrieval results comprise triples and paragraph summaries corresponding to one or more paragraph units in the second texts.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the program is executed by the processor.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1 to 6.

Description

Information processing method, information processing apparatus, computer device, computer readable storage medium, and computer program product Technical Field The present application relates to artificial intelligence technology, and more particularly, to an information processing method, apparatus, computer device, computer readable storage medium, and computer program product. Background The retrieval enhancement generation (RAG, RETRIEVAL AUGMENTED GENERATION) remarkably improves the accuracy and the confidence of the large language model (LLM, large Language Model) in a text generation scene by integrating rich knowledge of an external knowledge base, ensures the instantaneity and the dynamic adaptability of the knowledge base in the LLM use process, and can realize seamless integration of cross-domain information so as to effectively expand the LLM. However, the conventional RAG technology is applicable to processing explicit generation tasks centered on "search" based on text blocking and matching query strategies, and performs poorly in processing implicit generation tasks centered on "abstract extraction", resulting in lower search accuracy. Disclosure of Invention The embodiment of the application provides an information processing method, an information processing device, computer equipment, a computer readable storage medium and a computer program product, which can improve the retrieval accuracy. The technical scheme of the embodiment of the application is realized as follows: The embodiment of the application provides an information processing method, which comprises the steps of respectively carrying out semantic understanding on a first text in a knowledge base and each paragraph unit in the first text, obtaining a text abstract corresponding to the first text and a paragraph abstract corresponding to each paragraph unit in the first text, and obtaining a triplet of each paragraph unit in the first text, wherein the triplet comprises a head entity, a relation and a tail entity, and the first text is any text in a plurality of texts included in the knowledge base; Constructing a local map index corresponding to the first text based on triples of paragraph units in the first text, clustering the local map indexes corresponding to the texts, and dividing communities of the texts according to clustering results to generate a global map index, wherein nodes in the local map index correspond to head entities or tail entities in the triples of paragraph units in the first text, and edges between two nodes in the local map index represent association relations of the two nodes; Retrieving the global map index based on a user question to obtain one or more first retrieval results related to the user question, wherein the first retrieval results comprise a text abstract, a paragraph abstract and a triplet associated with a second text, and the second text is a text related to the user question in the plurality of texts; And searching the text abstract and the paragraph abstract in the one or more first search results based on the user problem to obtain one or more second search results, and generating a target result corresponding to the user problem based on the one or more second search results, wherein the second search results comprise triples and paragraph summaries corresponding to one or more paragraph units in the second text. In the scheme, the text abstract of the first text, the paragraph abstract corresponding to each paragraph unit in the first text and the first entity set associated with the first text are stored in the local map index, the first entity set comprises the head entity and the tail entity of each triplet associated with the first text, each community in the global map index correspondingly stores a second entity set, the second entity set comprises the first entity set of each first text associated with each node in at least part of the communities, the global map index is searched based on a user question, one or more first search results related to the user question are obtained, the method comprises the steps of obtaining first entity information and first non-entity information in the user question, calculating similarity of second entity sets associated with the first entity information and communities, determining one or more second texts related to the user question based on the similarity, and determining one or more first search results related to the one or more second texts. In the scheme, the method for calculating the similarity of the first entity information and the second entity sets associated with communities comprises the steps of calculating one or more second texts related to the user problems based on the similarity, determining one or more first communities with the similarity being greater than or equal to a first threshold value, calculating the similarity of the first entity information and the entity sets associated with the ith-1 level nodes in