Search

CN-121996783-A - Method and system for generating report text in front technical field based on depth research

CN121996783ACN 121996783 ACN121996783 ACN 121996783ACN-121996783-A

Abstract

The application provides a method and a system for generating a report text in the technical field of the front edge based on deep research, wherein the method comprises the following steps of 1, constructing a multi-level chapter outline according to a research subject, 2, writing contents oriented to sub-titles on the basis of the generated chapter outline, 3, evaluating the credibility of the contents generated in the step 2, and establishing semantic association between a reference document and the generated contents through entity extraction and semantic matching technology so as to identify and correct possible illusion phenomena. The method has the advantages of avoiding the defects of 'deep but not deep' and 'missing key information' in the traditional document retrieval, greatly improving the accuracy of the authenticity verification, and systematically solving the core problems of disordered structure, information deletion, content illusion and the like in the traditional report generation.

Inventors

  • HU MINGHAO
  • SHI HAN
  • LUO ZHUNCHEN
  • LUO WEI
  • SONG YU
  • ZHOU XIAN
  • MAO BIN
  • TIAN CHANGHAI

Assignees

  • 中国人民解放军军事科学院军事科学信息研究中心

Dates

Publication Date
20260508
Application Date
20251229

Claims (8)

  1. 1. A method for generating report text in the technical field of front edge based on depth research comprises the following steps: Step 1, constructing a multi-level chapter outline according to a study subject; Step 2, writing contents oriented to the subtitles on the basis of the generated chapter outline; And 3, carrying out credibility evaluation on the content generated in the step 2, and establishing semantic association between the reference document and the generated content through entity extraction and semantic matching technology so as to identify and correct possible illusion phenomena.
  2. 2. The method for generating a report text in the advanced research-based frontier technology field as defined in claim 1, wherein the step 1 includes: step 1-1, aiming at a given research topic, firstly, generating a multi-level outline with hierarchical logic based on a general chapter system reported by a set front technical field and combining coarse-granularity network retrieval and deep analysis acquired research topic related information; and step 1-2, decomposing the multi-level outline into independent depth research units according to chapter dimensions by utilizing a general large language model, and generating a writing plan for each chapter.
  3. 3. The method for generating a report text in the advanced research-based frontier technology field as defined in claim 2, wherein the step 1-1 includes: Acquiring original information related to a research topic by utilizing a web crawler technology to obtain coarse-granularity network retrieval aiming at an item and output of the original retrieval information; and calculating and sequencing the semantic similarity of the search results by using the BGE model.
  4. 4. The method for generating the report text based on the leading edge technical field of the depth study according to claim 1, wherein the step 2 comprises: Step 2-1, generating search keywords matched with the title semantics by utilizing a large voice model based on chapter titles of the multistage outline, and clearly adapting to the document search topic scope of the depth study; step 2-2, searching and obtaining an original reference document set corresponding to the chapter title through a multi-source depth network, screening high-confidence documents based on a preset confidence threshold value, and storing the high-confidence documents into a cache library; And 2-3, carrying out multi-round deep semantic analysis and iterative reasoning on the high-confidence documents in the cache library, mining core information adapting to the depth research requirements in the documents, judging whether the documents fully support the depth solution requirements of the chapter contents, triggering a new round of retrieval cycle if the documents are insufficiently supported, and generating a report first draft according to a chapter writing plan based on the refined core information of the documents if the documents are sufficiently supported.
  5. 5. The method for generating a report text based on the leading edge technical field of the depth study according to claim 4, wherein the step 2-3 comprises: Extracting relevant literature evidence from a cache library by utilizing a mixed retrieval mechanism, carrying out keyword matching by using a BM25 algorithm, calculating semantic similarity by combining Sentence-BERT, judging whether the semantic similarity is enough to support the solution to the problem by a large language model, directly generating paragraph contents on the basis of the large language model by combining a task specific instruction template and few sample prompts if the literature support is sufficient, and triggering a new round of literature retrieval cycle if the literature support is insufficient.
  6. 6. The method for generating the report text based on the leading edge technical field of the depth study according to claim 1, wherein the step 3 comprises: Step 3-1, extracting reference document entity sets from high-confidence documents respectively by using entity extraction technology, extracting content entity sets from report manuscripts, and verifying clear core objects for depth facts; step 3-2, mapping each entity in the two entity sets into vector representation by utilizing a BGE model, eliminating entity surface expression differences through vector space conversion, and ensuring semantic matching accuracy of adaptation depth research; And 3-3, calculating the similarity of the generated content entity and the reference entity by adopting a cosine similarity algorithm based on vector representation, setting a preset threshold value of adapting depth research, judging that the generated content entity has document support if the reference entity exists so that the similarity is larger than the threshold value, judging that a first draft has no illusion if all the entities in the generated content entity set are successfully matched, carrying out secondary verification on each sentence of a paragraph where the doubtful entity is located by combining with a depth research background if the unmatched doubtful entity exists, generating a corrected sentence based on a document basis and replacing original error content if the illusion is confirmed, retaining the original content if the illusion is confirmed, and finally outputting a real credible report text meeting the requirement of the depth research of a front edge technology.
  7. 7. The method for generating a report text based on the advanced research technical field of claim 6, wherein in the step 3-3, if there is an unmatched suspicious entity, performing a secondary verification on each sentence of a paragraph where the suspicious entity is located in combination with the advanced research background, and if the sentence is confirmed to be illusive, generating a corrected sentence based on a literature basis and replacing the original error content, specifically: Aiming at the suspected entity which is not matched, each sentence in the paragraph is split sentence by sentence and deeply analyzed by a large language model, the large language model judges whether the generated content can be deduced by reference documents and is used as the basis of whether the generated content forms phantom content, and if a certain sentence is confirmed to be phantom, the large language model is called to rewrite the sentence and replace the original sentence content.
  8. 8. A system for generating a report text in the technical field of front edge based on depth research, which is realized based on the method of any one of claims 1 to 7, wherein the system comprises: the framework planning module is used for constructing a multi-level chapter outline according to the study subject; a multi-source depth mining module for performing subtitle-oriented content writing based on the generated chapter outline, and The fact precision checking module is used for carrying out credibility evaluation on the content generated by the multi-source depth mining module, and establishing semantic association between the reference document and the generated content through entity extraction and semantic matching technology so as to identify and correct possible illusion phenomena.

Description

Method and system for generating report text in front technical field based on depth research Technical Field The application belongs to the technical field of information retrieval and natural language processing, and particularly relates to a method and a system for generating report text in the technical field of front edge based on deep research. Background The report of the front technical field is a core file for supporting technology development planning, result conversion evaluation and industry trend research and judgment, and the generation efficiency and quality are directly related to the innovation development and the core competitive cultivation of the industry. Aiming at massive and multi-source scientific and technological data which need to be processed in time, the traditional method relies on fixed template filling or direct generation, and can complete basic writing tasks, but has insufficient flexibility, is difficult to adapt to the deep research requirement of dynamic expansion, is easy to have a problem of logic inconsistency, and cannot meet the requirements of deep research on content rigor and suitability. In recent years, a plurality of innovative methods are gradually emerging in the fields of text generation and deep research. AGENTWRITE breaks through the limitation of a context window by simulating human deep writing thinking and adopting a segmentation generation strategy to provide support for deep creation of a long text, STORM provides a planning-execution double-stage generation framework, effectively enhances the continuity and structural integrity of text logic and adapts to the structural requirement of deep research, a planning mechanism is embedded into a model framework by other researches to realize efficient single-round long text generation and improve the efficiency of the deep research, and CO-STORM continuously revises and expands information through multiple rounds of conversations to optimize the text quality and assist content iteration in the deep research. Meanwhile, aiming at a complex deep research task, deepSeek improves reasoning performance through multi-stage training and a small amount of cold start data, adapts to complex reasoning requirements of the deep research, DEEPRESEARCH adopts reinforcement learning fine-tuning strategy, supports autonomous planning, searching and verifying in a dynamic environment, generates a structured credible research report, strengthens the credibility of the deep research, DEEPRESEARCHER performs end-to-end reinforcement learning training in a real network environment for the first time, enables the system to autonomously cope with the complexity of an open network, and is suitable for the deep research task requiring multi-source fusion and cross verification. However, the existing method is still weak in coping capability of fuzzy tasks in depth research, retrieval and depth analysis of 'correlation' contents cannot be accurately realized, a model is easy to generate 'illusion', unreal information is introduced, accuracy and credibility of a depth research report are seriously affected, and high-quality front technical field depth research work is difficult to support. Disclosure of Invention The application aims to overcome the defects that the prior art cannot accurately realize the retrieval and the depth analysis of the 'correlation' content, the model is easy to generate 'illusion', the accuracy and the credibility of a depth research report are seriously influenced by introducing unreal information, and the high-quality front technical field depth research work is difficult to support. In order to achieve the above object, the present application provides a method for generating report text in the technical field of front edge based on deep research, including: Step 1, constructing a multi-level chapter outline according to a study subject; Step 2, writing contents oriented to the subtitles on the basis of the generated chapter outline; And 3, carrying out credibility evaluation on the content generated in the step 2, and establishing semantic association between the reference document and the generated content through entity extraction and semantic matching technology so as to identify and correct possible illusion phenomena. As an improvement of the above method, the step 1 includes: step 1-1, aiming at a given research topic, firstly, generating a multi-level outline with hierarchical logic based on a general chapter system reported by a set front technical field and combining coarse-granularity network retrieval and deep analysis acquired research topic related information; and step 1-2, decomposing the multi-level outline into independent depth research units according to chapter dimensions by utilizing a general large language model, and generating a writing plan for each chapter. As an improvement of the above method, the step 1-1 includes: Acquiring original information related to a research topic