CN-121997895-A - Document generation method, device, equipment and medium
Abstract
The application discloses a document generation method, a device, equipment and a medium, and relates to the field of artificial intelligence, wherein the method comprises the following steps of obtaining a document theme, wherein the document theme is used for indicating the theme content of a first document to be generated; the method comprises the steps of obtaining a plurality of sample documents and a plurality of data contents, determining at least one sample document from the plurality of sample documents based on a document theme, determining first data contents from the plurality of data contents based on the document theme, performing format conversion on the first data contents to obtain first chart contents, analyzing the first chart contents through a first natural language model, outputting to obtain first text contents for describing the first chart contents, and analyzing the document theme, the at least one sample document, the first chart contents and the first text contents through a second natural language model, and outputting to obtain the first document. The generation efficiency and the document quality of the document can be improved.
Inventors
- JING FANG
- SONG CHENYANG
- SUN SHISHENG
- FAN JING
- ZHANG ZHONGJIE
- ZHANG DUNJIAN
- ZHANG KAIYUAN
- LI ZHONGJIAN
- DONG FANGHONG
- PENG BING
Assignees
- 昆仑数智科技有限责任公司
- 中国石油天然气集团有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241107
Claims (10)
- 1. A document generation method, the method comprising: Acquiring a document theme, wherein the document theme is used for indicating theme content of a first document to be generated; Acquiring a plurality of sample documents and a plurality of data contents, wherein the plurality of sample documents meet preset document quality requirements, and the sample documents comprise chart contents and text contents for describing the chart contents; determining at least one document from the plurality of sample documents based on the document theme, and determining a first data content from the plurality of data contents based on the document theme; Performing format conversion on the first data content to obtain first chart content, wherein the first chart content describes the first data content in a chart form; analyzing the first chart content through a first natural language model, and outputting first text content for describing the first chart content; And analyzing the document theme, the at least one sample document, the first chart content and the first text content through a second natural language model, and outputting to obtain the first document.
- 2. The method of claim 1, wherein analyzing the first chart content by a first natural language model, outputting first text content describing the first chart content, comprises: Obtaining at least one chart text data pair based on the corresponding relation between the chart content and the text content in the at least one chart file, wherein the ith chart text data pair comprises the ith chart content and the text content for describing the ith chart content, and i is a positive integer; constructing a text generation prompt word comprising the first chart content and the text generation template by taking the at least one chart text data pair as a text generation template, wherein the text generation prompt word is used for indicating to generate text content corresponding to the first chart content based on the text generation template; Analyzing the text to generate a prompt word and the first chart content through the first natural language model, and outputting the first text content for describing the first chart content.
- 3. The method according to claim 1, wherein the sample document contains a sample document theme, and the sample document is marked with a quality score, and the quality score is used for describing the condition that the sample document meets the preset document quality requirement; the determining at least one sample document from the plurality of sample documents based on the document theme includes: Extracting features of the document theme to obtain a first feature representation; extracting features of the plurality of sample documents to obtain a plurality of sample feature representations respectively corresponding to the plurality of sample documents; calculating the similarity between the first characteristic representation and the sample characteristic representations respectively to obtain a similarity result; sorting the plurality of sample documents based on the similarity result to obtain a plurality of sorted sample documents; the at least one sample document is determined from the ranked plurality of sample documents based on the quality score.
- 4. A method according to any one of claims 1 to 3, wherein the sample document comprises a sample document title and a sample document catalog; The method further comprises the steps of: taking at least one sample document title corresponding to the at least one sample document as a title generation template, and constructing a title generation prompt word comprising the at least one sample document title and the document theme; Taking at least one sample document catalog corresponding to the at least one sample document respectively as a catalog generation template, and constructing a catalog generation prompt word comprising the at least one sample document catalog and the document theme; Analyzing the title generation prompt word and the catalog generation prompt word through a third natural language analysis model, and outputting to obtain a first document title and a first document catalog.
- 5. The method of claim 4, wherein analyzing the document theme, the at least one sample document, the first chart content, and the first text content by a second natural language model, outputting the first document, comprises: Analyzing the at least one text file, the document theme, the first chart content, the first text content, the first document title and the first document catalog through the second natural language model, and outputting to obtain the first document; The first document comprises the document theme, the first chart content, the first text content, the first document title and the first document catalog.
- 6. A method according to any one of claims 1 to 3, wherein said analyzing said document theme, said at least one document, said first chart content and said first text content by a second natural language model, outputting said first document comprises: Determining a framework structure of the first document based on the document theme, the first chart content and the first text content to obtain integrated content, wherein the integrated content comprises the document theme, the first chart content and the first text content; analyzing the at least one sample document and the integrated content through the second natural language model to generate document content corresponding to the integrated content; and embedding the document content into the integrated content through the second natural language model, and outputting to obtain the first document.
- 7. A method according to any one of claims 1 to 3, wherein the method further comprises: Scoring the first document based on the preset document quality requirement to obtain a first quality score corresponding to the first document, wherein the first quality score is used for describing the condition that the first document meets the preset document quality requirement; and adding the first document to the plurality of sample documents under the condition that the first quality score reaches a preset quality score threshold value, so as to obtain a plurality of updated sample documents.
- 8. A document generating apparatus, the apparatus comprising: the acquisition module is used for acquiring a document theme, wherein the document theme is used for indicating the theme content of a first document to be generated; The acquisition module is further used for acquiring a plurality of sample documents and a plurality of data contents, wherein the plurality of sample documents meet preset document quality requirements, and the sample documents contain chart contents and text contents for describing the chart contents; A determining module for determining at least one document from the plurality of sample documents based on the document theme, and determining a first data content from the plurality of data contents based on the document theme; The chart generation module is used for carrying out format conversion on the first data content to obtain first chart content, wherein the first chart content describes the first data content in a chart form; The text content generation module is used for analyzing the first chart content through a first natural language model and outputting first text content for describing the first chart content; And the document generation module is used for analyzing the document theme, the at least one sample document, the first chart content and the first text content through a second natural language model and outputting the first document.
- 9. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program that is loaded and executed by the processor to implement the document generation method of any of claims 1 to 7.
- 10. A computer-readable storage medium, characterized in that at least one section of a program is stored in the storage medium, the at least one section of a program being loaded and executed by a processor to implement the document generation method according to any one of claims 1 to 7.
Description
Document generation method, device, equipment and medium Technical Field The embodiment of the application relates to the field of artificial intelligence, in particular to a document generation method, a document generation device, a document generation equipment and a document generation medium. Background Text generation tasks are important fields of application in the field of artificial intelligence, the goal of which is to create consistent and grammatical and semantic rules-compliant text content. Text content generated by performing a text generation task can assist a user in authoring document content. In the related art, a large language model is pre-trained through a large amount of text data, so that the large language model can learn language rules and knowledge in the text data. And sending the instruction for generating the document and the related data to the trained large language model, so that more natural and coherent texts can be generated. The user modifies the text generated by the large language model and assembles the text with other types of data to obtain the document content conforming to the format. However, the large language model still has limitations in generating document content with specified industry specifications and document standards, the text content generated by the model usually does not have a complex format and cannot be directly used as the document content, and a user still needs to additionally acquire other data content and assemble the text content to obtain the document content meeting the quality expectations. Disclosure of Invention The embodiment of the application provides a method, a device, equipment and a medium for generating a document, which can improve the generation efficiency and quality of the document. The technical scheme is as follows: in one aspect, a document generation method is provided, the method including: Acquiring a document theme, wherein the document theme is used for indicating theme content of a first document to be generated; Acquiring a plurality of sample documents and a plurality of data contents, wherein the plurality of sample documents meet preset document quality requirements, and the sample documents contain chart contents and text contents for describing the chart contents; determining at least one document from the plurality of sample documents based on the document theme, and determining a first data content from the plurality of data contents based on the document theme; Performing format conversion on the first data content to obtain first chart content, wherein the first chart content describes the first data content in a chart form; analyzing the first chart content through a first natural language model, and outputting first text content for describing the first chart content; And analyzing the document theme, the at least one sample document, the first chart content and the first text content through a second natural language model, and outputting to obtain the first document. In another aspect, there is provided a document generating apparatus, the apparatus including: the acquisition module is used for acquiring a document theme, wherein the document theme is used for indicating the theme content of a first document to be generated; The acquisition module is further used for acquiring a plurality of sample documents and a plurality of data contents, wherein the plurality of sample documents meet preset document quality requirements, and the sample documents contain chart contents and text contents for describing the chart contents; A determining module for determining at least one document from the plurality of sample documents based on the document theme, and determining a first data content from the plurality of data contents based on the document theme; The chart generation module is used for carrying out format conversion on the first data content to obtain first chart content, wherein the first chart content describes the first data content in a chart form; The text content generation module is used for analyzing the first chart content through a first natural language model and outputting first text content for describing the first chart content; And the document generation module is used for analyzing the document theme, the at least one sample document, the first chart content and the first text content through a second natural language model and outputting the first document. In an alternative embodiment, the text content generating module is further configured to obtain at least one graph text data pair based on a correspondence between graph content and text content in the at least one sample document, where the i-th graph text data pair includes the i-th graph content and text content for describing the i-th graph content, i is a positive integer, construct a text generating prompt word including the first graph content and the text generating template with the at least one graph text data pair