CN-122021601-A - Document generation method, device, electronic equipment and storage medium
Abstract
The invention relates to the technical field of computers and provides a document generation method, a device, electronic equipment and a storage medium, wherein the method comprises the steps of determining the service scene type of a target document to be generated based on a writing request of a user; the method comprises the steps of determining a reference information type according to a service scene type, obtaining reference data corresponding to the reference information type, integrating the reference data according to the service scene type to obtain a reference information set consisting of logically-related reference data, and generating a target document based on the reference information set, wherein the reference information type comprises at least one of a document structure template, structured service data, a semantic association text and a compliance standard term. The invention breaks through the retrieval limitation of single dependent keywords or semantic similarity, and realizes the deep fusion of structured data, unstructured text and industry specifications, thereby effectively solving the problems of enterprise-level document generation style discomfort, data fragmentation and lack of compliance constraint in the related technology.
Inventors
- FAN LIANG
- Zheng Daiyang
- HUANG FAN
- WANG QING
- LI JIAN
Assignees
- 科大讯飞股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (10)
- 1. A document generation method, comprising: determining the service scene type of a target document to be generated based on a writing request of a user; Determining a reference information type according to the service scene type, and acquiring reference data corresponding to the reference information type, wherein the reference information type comprises at least one of a document structure template, structured service data, a semantic association text and a compliance standard clause; integrating the reference data according to the service scene type to obtain a reference information set composed of logically related reference data; Generating the target document based on the reference information set; after generating the target document, further comprising: performing quality evaluation on the target document to obtain an evaluation score; if the evaluation score is lower than a preset threshold, marking the target document as a negative sample, and recording a reason label of which the evaluation score is lower than the preset threshold; taking a target document with the evaluation score being greater than or equal to the preset threshold value as a forward sample; calculating the content characteristic difference of the positive sample and the negative sample; And adjusting the suppression weight of the model to the characteristic mode causing the negative sample in the generation process according to the content characteristic difference.
- 2. The document generating method according to claim 1, wherein said determining a reference information type from the traffic scene type includes: identifying constraint requirements of the service scene type on content accuracy and format normalization; And selecting at least one of the document structure template, the structured business data, the semantic association text and the compliance standard clause as the reference information type according to the constraint requirement.
- 3. The document generating method according to claim 2, wherein selecting at least one of the document structure template, the structured business data, the semantically-related text, and the compliance standard terms as the reference information type according to the constraint requirements includes: selecting the document structure template, the structured business data and the compliance standard clause as the reference information type under the condition that the constraint requirement indicates a strong rule constraint class; in the case that the constraint requirement indicates an open authoring class, the document structure template and the semantically-related text are selected as the reference information type.
- 4. A document generation method according to any one of claims 1 to 3, wherein the integrating the reference data according to the service scene type to obtain a reference information set composed of reference data with logical association includes: determining weight parameters corresponding to reference data of different reference information types according to the service scene types; Screening each reference data based on the weight parameters of each reference data to obtain candidate data; And mapping the candidate data to corresponding structure positions based on the document structure template to obtain a reference information set composed of reference data with logic association.
- 5. A document generating method according to any one of claims 1 to 3, wherein the acquiring the reference data corresponding to the reference information type includes: If the reference information type comprises the document structure template, matching a document structure with highest similarity in a history document library based on the writing request as corresponding reference data; If the reference information type comprises the structured service data, searching numerical indexes in a service system based on the service keywords in the writing request as corresponding reference data; If the reference information type comprises the semantic association text, converting the writing request into a semantic vector and recalling similar historical text fragments as corresponding reference data; And if the reference information type comprises the compliance standard clause, associating corresponding industry specification clause based on the business scene type as corresponding reference data.
- 6. A document generating method according to any one of claims 1 to 3, wherein before acquiring the reference data corresponding to the reference information type, further comprising: analyzing the history document, and identifying at least one of text, form, image and flow chart in the document; respectively carrying out structuring treatment on the text, the table, the image and the flow chart to obtain a knowledge segment containing a business attribute label; And filtering the sensitive information of the knowledge segments, and storing the filtered knowledge segments into a knowledge base.
- 7. The document generating method according to claim 6, wherein the structuring the text, the table, the image, and the flowchart to obtain knowledge segments including business attribute tags includes: Identifying the semantic boundary of the text content and segmenting the text to obtain an independent business unit serving as the knowledge segment; identifying key fields in a table aiming at the table, and converting the key fields into key value pair records serving as the knowledge pieces; extracting key texts and visual elements in the images aiming at the images to generate structural description information serving as the knowledge segments; And analyzing the logic relation and the flow direction of the flow nodes aiming at the flow chart, and generating a data structure which is used as the knowledge segment and represents node jump logic.
- 8. A document generating apparatus, comprising: the determining module is used for determining the service scene type of the target document to be generated based on the writing request of the user; The acquisition module is used for determining a reference information type according to the service scene type and acquiring reference data corresponding to the reference information type, wherein the reference information type comprises at least one of a document structure template, structured service data, a semantic association text and a compliance standard term; The integration module is used for integrating the reference data according to the service scene type to obtain a reference information set composed of logically associated reference data; The generation module is used for generating the target document based on the reference information set; after generating the target document, further comprising: performing quality evaluation on the target document to obtain an evaluation score; if the evaluation score is lower than a preset threshold, marking the target document as a negative sample, and recording a reason label of which the evaluation score is lower than the preset threshold; taking a target document with the evaluation score being greater than or equal to the preset threshold value as a forward sample; calculating the content characteristic difference of the positive sample and the negative sample; And adjusting the suppression weight of the model to the characteristic mode causing the negative sample in the generation process according to the content characteristic difference.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the document generation method of any of claims 1 to 7 when the computer program is executed.
- 10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the document generation method according to any one of claims 1 to 7.
Description
Document generation method, device, electronic equipment and storage medium Technical Field The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for generating a document, an electronic device, and a storage medium. Background In enterprise project document composition scenarios, it has become a common authoring model to retrieve and multiplex high quality content segments in historical documents. At present, retrieval enhancement generation (RETRIEVAL-Augmented Generation, RAG) is often used in combination with large language models (Large Language Model, LLM) to assist in authoring. However, the method generally only depends on semantic similarity to recall unstructured text fragments, so that generated contents often lack logical structural association, data deviation is easy to occur or specific scene specification is not met, and the usability of generated documents is low. Disclosure of Invention The invention provides a document generation method, a document generation device, electronic equipment and a storage medium, which are used for solving the defects in the prior art. The invention provides a document generation method, which comprises the following steps: determining the service scene type of a target document to be generated based on a writing request of a user; Determining a reference information type according to the service scene type, and acquiring reference data corresponding to the reference information type, wherein the reference information type comprises at least one of a document structure template, structured service data, a semantic association text and a compliance standard clause; integrating the reference data according to the service scene type to obtain a reference information set composed of logically related reference data; And generating the target document based on the reference information set. According to the document generation method provided by the invention, the reference information type is determined according to the service scene type, and the method comprises the following steps: identifying constraint requirements of the service scene type on content accuracy and format normalization; And selecting at least one of the document structure template, the structured business data, the semantic association text and the compliance standard clause as the reference information type according to the constraint requirement. According to the document generation method provided by the invention, at least one type selected from the document structure template, the structured business data, the semantically related text and the compliance standard clause is selected as the reference information type according to the constraint requirement, and the method comprises the following steps: selecting the document structure template, the structured business data and the compliance standard clause as the reference information type under the condition that the constraint requirement indicates a strong rule constraint class; in the case that the constraint requirement indicates an open authoring class, the document structure template and the semantically-related text are selected as the reference information type. According to the document generation method provided by the invention, the reference data is integrated according to the service scene type to obtain a reference information set composed of logically related reference data, and the method comprises the following steps: determining weight parameters corresponding to reference data of different reference information types according to the service scene types; Screening each reference data based on the weight parameters of each reference data to obtain candidate data; And mapping the candidate data to corresponding structure positions based on the document structure template to obtain a reference information set composed of reference data with logic association. According to the document generation method provided by the invention, after the target document is generated, the method further comprises the following steps: performing quality evaluation on the target document to obtain an evaluation score; if the evaluation score is lower than a preset threshold, marking the target document as a negative sample, and recording a reason label of which the evaluation score is lower than the preset threshold; And carrying out parameter optimization on a model for generating the target document by utilizing the negative sample and the reason label. According to the document generation method provided by the invention, the parameter optimization is performed on the model for generating the target document by using the negative sample and the reason tag, and the method comprises the following steps: taking a target document with the evaluation score being greater than or equal to the preset threshold value as a forward sample; calculating the content characteristic difference o