Search

CN-121833945-B - Automatic text outline generation method and system based on artificial intelligence

CN121833945BCN 121833945 BCN121833945 BCN 121833945BCN-121833945-B

Abstract

The invention relates to the technical field of data processing, in particular to an automatic generation method and system of a text outline based on artificial intelligence, wherein the method comprises the steps of calculating the weight of temperature parameters in a PEGASUS model decoding stage, training a PEGASUS model to obtain a text outline generation model, and processing a document to be processed by using the text outline generation model to generate the text outline; the method for calculating the weight of the temperature parameter comprises the steps of converting the preprocessed original corpus into a high-dimensional word vector sequence, extracting a hierarchical tree of a candidate outline currently generated, calculating a hierarchical aggregation coefficient of each hierarchical node in the hierarchical tree, calculating a semantic fidelity index of the hierarchical node, calculating an iteration stability index based on the semantic fidelity index, and taking the iteration stability index after normalization processing as the weight of the temperature parameter. The method solves the problem that the traditional generative model is disordered in logic when processing a long professional document.

Inventors

  • Ai Pili
  • Qin Chunjia
  • HUANG YU
  • CHEN SHIHUA

Assignees

  • 广东知一数据有限公司

Dates

Publication Date
20260508
Application Date
20260311

Claims (8)

  1. 1. An automatic text outline generating method based on artificial intelligence is characterized by comprising the following steps: Calculating the weight of temperature parameters in a PEGASUS model decoding stage, training a PEGASUS model to obtain a text outline generating model, and processing a document to be processed by using the text outline generating model to generate a text outline; The method for calculating the weight of the temperature parameter comprises the steps of obtaining an original corpus, preprocessing the original corpus, converting the preprocessed original corpus into a high-dimensional word vector sequence, extracting a hierarchical tree of a candidate outline currently generated, calculating a hierarchical aggregation coefficient of each hierarchical node in the hierarchical tree, wherein the hierarchical aggregation coefficient is in negative correlation with a vector standard deviation of a peer node set to which the corresponding hierarchical node belongs; The calculating method of the semantic fidelity index of the hierarchical node comprises the steps of calculating the mutual information matching score of the hierarchical node outline item and the theme keyword sequence, calculating the product of the logic depth of the hierarchical node and the hierarchical aggregation coefficient, taking the product as the index item of the index function, and taking the ratio of the mutual information matching score and the index function as the semantic fidelity index; The calculation method of the iteration stability index comprises the steps of marking a sequence formed by semantic fidelity indexes generated by the latest multiple iterations as a outline index sequence, wherein the expression of the iteration stability index is as follows: , Represent the first The iteration stability index of the round of iteration, Is the first Semantic fidelity index mean values of all hierarchy nodes in the round generation result, Is the first Semantic fidelity index mean values of all hierarchy nodes in the round generation result, Is the first Semantic fidelity index mean values of all hierarchy nodes in the round generation result, For the number of data in the outline index sequence, In order to sum the variables of the count, Is a preset parameter.
  2. 2. The method for automatically generating text outline based on artificial intelligence according to claim 1, wherein the calculation method of the hierarchical aggregation coefficient is that cosine similarity of a hierarchical node vector and a direct parent hierarchical node vector is calculated, and the ratio of the cosine similarity to the vector standard deviation of a peer node set to which the corresponding hierarchical node belongs is used as the hierarchical aggregation coefficient.
  3. 3. The automatic text outline generating method based on artificial intelligence according to claim 1, wherein the method further comprises the steps of vectorizing full-text core semantics by using an encoder, calculating a vectorized module of each core semantics, and dividing the vectorized module of the core semantics to obtain a topic keyword sequence.
  4. 4. The automatic text outline generating method based on artificial intelligence according to claim 3, wherein the method is characterized in that a model of the core semantics is segmented by using an Ojin threshold segmentation method, and a sequence formed by the core semantics with the segmentation threshold value being larger than the sequence of the topic keywords of the document.
  5. 5. The automatic text outline generating method based on artificial intelligence according to claim 1, wherein the method for generating the text outline by processing the document to be processed by using the text outline generating model comprises the steps of processing the document to be processed by using the text outline generating model to generate a first-level outline, taking the generated first-level outline as a context, generating deep sub-items by combining temperature parameters regulated by iteration stability indexes, calculating iteration stability indexes in real time after each iteration generation, stopping iteration when convergence requirements are met, and outputting the text outline with clear level and semantic consistency finally.
  6. 6. The automatic text outline generating method based on artificial intelligence according to claim 1 is characterized in that the method for preprocessing original corpus comprises the steps of removing webpage noise by means of a regular matching method, carrying out structural splitting on long text by means of a segmentation function, identifying and retaining original chapter numbers and hierarchical titles, and calling an NLP word segmentation tool to carry out Chinese word segmentation, part-of-speech tagging and stop word filtering.
  7. 7. The automatic text outline generating method based on artificial intelligence according to claim 1, wherein the method for obtaining the original corpus is that long text data to be processed is collected through an enterprise internal knowledge base API, and the original corpus is obtained through keyword retrieval and URL depth traversal.
  8. 8. An artificial intelligence based text outline automatic generation system comprising a processor and a memory, the memory storing computer program instructions which when executed by the processor implement an artificial intelligence based text outline automatic generation method according to any one of claims 1-7.

Description

Automatic text outline generation method and system based on artificial intelligence Technical Field The invention relates to the technical field of data processing, in particular to an automatic text outline generation method and system based on artificial intelligence. Background With the explosive growth of digital information, core logic is rapidly extracted from massive long texts and a structural outline is generated, so that the method has become an urgent need in the fields of scientific research, office work, content creation and academic research. The development of artificial intelligence technology, particularly natural language processing, has provided possibilities for the automated generation of text summaries. The outline generation not only requires the system to accurately capture the central thought of the text, but also requires the generated content to have high logic consistency, layering sense and coverage rate of key information, thereby helping users grasp the context in a short time. In the prior art, PEGASUS models are representative models in the field of generating text summaries. The algorithm has remarkable advantages in terms of processing text structures and extracting semantics through the blank sentence generation task in large-scale pre-training, and text fragments with consistent semantics and strong generalization can be generated. However, since the text outline generating task has strong logic hierarchy and unique characteristics in high concentration of domain knowledge, the direct use of the original PEGASUS model can cause technical problems that the generated outline is disordered in logic hierarchy and main and secondary points cannot be accurately distinguished when facing a long professional document. The reason for this is that the loss function of the original algorithm mainly focuses on overall semantic similarity, but lacks fine descriptions of structural features such as chapter association, indentation hierarchy and the like of the document, so that the generated outline is difficult to directly use for guiding the actual document reconstruction or deep reading. Disclosure of Invention The invention provides an automatic text outline generation method and system based on artificial intelligence, which aims to solve the technical problem that when a PEGASUS model is used to cause the generated outline to face a long professional document, logic level confusion occurs and main and secondary points cannot be distinguished accurately. In a first aspect, the present invention provides an artificial intelligence based text outline automatic generation method, which adopts the following technical scheme: an automatic text outline generating method based on artificial intelligence comprises the following steps: Calculating the weight of temperature parameters in a PEGASUS model decoding stage, training a PEGASUS model to obtain a text outline generating model, and processing a document to be processed by using the text outline generating model to generate a text outline; The method for calculating the weight of the temperature parameter comprises the steps of obtaining an original corpus, preprocessing the original corpus, converting the preprocessed original corpus into a high-dimensional word vector sequence, extracting a hierarchical tree of a candidate outline currently generated, calculating a hierarchical aggregation coefficient of each hierarchical node in the hierarchical tree, wherein the hierarchical aggregation coefficient is in negative correlation with a vector standard deviation of a peer node set to which the corresponding hierarchical node belongs, calculating a semantic fidelity index of the hierarchical node, wherein the semantic fidelity index is in negative correlation with the hierarchical aggregation coefficient, calculating an iteration stability index based on the semantic fidelity index, wherein the iteration stability index is in positive correlation with the semantic fidelity index, normalizing the iteration stability index, and taking the iteration stability index after normalization as the weight of the temperature parameter. The temperature parameter weight of the PEGASUS model decoding stage is dynamically adjusted by introducing the hierarchical aggregation coefficient, the semantic fidelity index and the iteration stability index, so that the problems of logic hierarchy confusion and unclear main and secondary main points when the traditional PEGASUS model directly processes a long professional document are effectively solved, the logical continuity, the hierarchical structure rationality and the core semantic retention capability of the generated outline are improved, and the generated outline is ensured to be in line with the macroscopic context of the document and can accurately cover microscopic details. Preferably, the calculation method of the hierarchical aggregation coefficient is that the cosine similarity of the hierarch