CN-121092746-B - Cross-format document page replacement method and system based on dynamic metadata mapping
Abstract
The invention relates to the technical field of computer application, in particular to a method and a system for replacing a cross-format document page based on dynamic metadata mapping; the method comprises the following steps of S1, constructing a three-dimensional metadata model, carrying out multi-dimensional document structure analysis by using the three-dimensional metadata model, S2, adopting a two-channel replacement mechanism, carrying out page replacement operation cooperatively by adopting a physical channel and a logical channel and combining the result obtained after the analysis in the step S1 to obtain a document for completing page replacement, S3, tracking page reference relation of the document for completing page replacement by using a page number updating algorithm, and realizing automatic connection of page crossing elements by using a page crossing element automatic connection technology, S4, establishing a three-dimensional verification system, ensuring the accuracy of page replacement and the quality of the document after replacement from different levels, and simultaneously taking into consideration the requirements of multiple aspects such as format fidelity, efficient processing, dynamic element updating and the like.
Inventors
- LIU TING
- GUO MINGXING
- JIANG WEI
Assignees
- 同方鼎欣科技股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20250828
Claims (12)
- 1. The cross-format document page replacement method based on dynamic metadata mapping is characterized by comprising the following steps of: S1, constructing a three-dimensional metadata model, and carrying out multi-dimensional document structure analysis by using the three-dimensional metadata model, wherein the three-dimensional metadata model comprises a content topology layer, a dynamic element layer and a style rule base; S2, adopting a double-channel replacement mechanism, carrying out page replacement operation cooperatively through a physical channel and a logic channel and combining the result analyzed in the step S1 to obtain a document for completing page replacement, specifically carrying out page replacement operation through the physical channel and the logic channel by using a pattern migration algorithm; the style migration algorithm specifically comprises the following steps: s21, extracting style features including fonts, colors and typesetting styles from data acquired from a content topology layer of a target page by using a CNN convolutional neural network; S22, adjusting the style of the newly replaced content to be consistent with the style of the target page by generating a GAN countermeasure network; S23, calling a CSS_Optimer module by using a constraint satisfaction algorithm, carrying out rule verification and format fine adjustment on the generated content according to CSS style rules defined in a style rule library, mapping style characteristics into CSS style instructions conforming to PDF typesetting specifications, and ensuring that the final content is completely consistent with the original page in terms of fonts, paragraphs, spacing and alignment modes; s3, tracking page reference relation of the document with the page replaced by using a page updating algorithm, and realizing automatic connection of the page crossing elements by using an automatic connection technology of the page crossing elements; S4, establishing a three-dimensional verification system, wherein the accuracy of page replacement and the quality of the replaced document are ensured from different layers, and the three-dimensional verification system comprises physical layer verification, logic layer verification and visual layer verification.
- 2. The method for replacing pages of a cross-format document based on dynamic metadata mapping according to claim 1, wherein the content topology layer in step S1 adopts a computer logic tree CTL for recording absolute coordinates, font properties, paragraph styles of text and pictures in the document.
- 3. The method for replacing pages of a cross-format document based on dynamic metadata mapping according to claim 1, wherein the dynamic element layer in step S1 comprises a locating algorithm for tracking headers, footers and directory entries.
- 4. The method for replacing a page of a cross-format document based on dynamic metadata mapping according to claim 1, wherein the style rule base in step S1 is used for storing style inheritance relations at the document level, including style inheritance between different levels of title 1, title 2 and text, and abnormal styles.
- 5. The method for replacing pages of a cross-format document based on dynamic metadata mapping according to claim 1, wherein in step S3, the specific method for tracking the page reference relation by the page update algorithm is that after the page replacement operation is completed, the incremental rearrangement strategy is adopted to judge pages affected by the page replacement operation, and the page numbers of the pages are updated.
- 6. The method for replacing the page of the cross-format document based on the dynamic metadata mapping according to claim 1, wherein in the step S3, the specific method of the automatic splicing technology of the cross-page elements is that a breakpoint detection algorithm is applied to the detected cross-page elements to segment and process the replaced content so as to realize automatic splicing.
- 7. The method for replacing pages of a cross-format document based on dynamic metadata mapping according to claim 1, wherein the physical layer verification in step S4 uses PDF/a standard for compliance checking, including physical structure, file format conforming to archive standard.
- 8. The method for replacing the cross-format document page based on the dynamic metadata mapping according to claim 1, wherein the logic layer in the step S4 verifies the integrity of the document content by means of metadata reconstruction, compares the metadata information before and after replacement, checks whether the text and the picture are lost or damaged, and checks whether the logic relationship of the paragraph structure and the paragraph style is correct.
- 9. The method for replacing pages of a cross-format document based on dynamic metadata mapping according to claim 1, wherein the visual layer verification in step S4 is based on the pixel-level difference analysis technology of OpenCV, the pages before and after replacement are compared pixel by pixel, an error range is set, and when the difference of the pixel contrast of the pages before and after replacement is within the error range, the page replacement is considered to be visually consistent.
- 10. The cross-format document page replacement system based on dynamic metadata mapping is characterized by comprising a multi-dimensional document structure analysis module, a differential page replacement module, a dynamic element recalculation module and a bidirectional verification module, wherein the multi-dimensional document structure analysis module is used for executing the step S1, the differential page replacement module is used for executing the step S2, the dynamic element recalculation module is used for executing the step S3, and the bidirectional verification module is used for executing the step S4.
- 11. An electronic device comprising one or more processors and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the cross-format document page replacement method based on dynamic metadata mapping of any of claims 1-9.
- 12. A computer readable program medium, characterized in that it stores computer readable instructions, which when executed by a processor, cause a computer to perform the cross-format document page replacement method based on dynamic metadata mapping as claimed in any one of claims 1 to 9.
Description
Cross-format document page replacement method and system based on dynamic metadata mapping Technical Field The invention relates to the technical field of computer application, in particular to a cross-format document page replacement method and system based on dynamic metadata mapping. Background In document processing work, a situation is often encountered in which a page replacement is required for a document that has been converted into the PDF format. At present, the prior related technology has a plurality of defects: The traditional PDF editing software can simply replace PDF pages, such as AdobeAcrobat, but the replacement is based on the operation of the page physical layer, and the original page content is directly covered. The method can not identify and retain complex layout information in the original document, such as accurate typesetting of characters, paragraph patterns, surrounding relation of pictures and characters, and the like, so that the format of the document after replacement is seriously distorted. When a document containing a plurality of elements is processed, patterns such as fonts, word sizes, line spaces and the like are often disordered, and the attractiveness and the professional performance of the document are greatly affected. Based on the alternative mode of text extraction, the method firstly extracts text information in PDF through OCR technology, then edits the text, and finally regenerates PDF document. However, this approach works poorly in the face of complex document layouts, such as multi-column layout, and mixed layout of charts and text. The method can not accurately restore the layout structure of the original document, and can not automatically update dynamic elements in PDF, such as page numbers, catalogues and the like, and needs manual adjustment, has complex operation and is easy to make mistakes. General document conversion tools, for example LibreOffice, to replace pages in a PDF, it is often necessary to re-render the entire document. The full text file reconstruction mode is extremely low in processing efficiency, long in time consumption when processing a large number of pages of documents, and difficult to realize accurate mapping from Word to PDF style in the cross-format conversion process, so that the requirement of a user on document format consistency cannot be met. The prior art CN106294493B discloses a method for realizing document format conversion, and discloses a method for realizing document format conversion, which comprises the steps of loading a word document, converting the loaded word document into a webpage document, converting a label in the webpage document into a native label, converting the style attribute in the label into an expansion label according to the corresponding relation between a pre-stored style attribute and the expansion label to obtain a Markdown document, wherein the Markdown document reserves the style effect corresponding to the style attribute in the webpage document, and analyzing the Markdown document through a lexical analyzer according to the corresponding relation between the expansion label and the style attribute in a rule sequence to restore the Markdown document into the webpage document. On the one hand, the method does not introduce an algorithm, has low processing efficiency and is difficult to realize high-efficiency file format conversion, and on the other hand, the method does not have an inspection mechanism, the quality of the converted file is difficult to ensure, and the high-fidelity file conversion is difficult to realize. In summary, the existing technical means cannot meet the needs of format fidelity, efficient processing, dynamic element updating and the like in the aspect of designating page replacement after processing Word into PDF. Therefore, there is a need to provide a method and a system for replacing a page of a cross-format document based on dynamic metadata mapping, which are capable of satisfying various requirements of format fidelity, efficient processing, dynamic element updating and the like. Disclosure of Invention The invention solves the technical problems existing in the prior art, and provides a cross-format document page replacement method and system based on dynamic metadata mapping. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A cross-format document page replacement method based on dynamic metadata mapping comprises the following steps: S1, constructing a three-dimensional metadata model, and carrying out multi-dimensional document structure analysis by using the three-dimensional metadata model, wherein the three-dimensional metadata model comprises a content topology layer, a dynamic element layer and a style rule base; S2, adopting a double-channel replacement mechanism, and cooperatively completing page replacement operation by combining the results of the analysis in the step S1 through a physical chann