Search

KR-20260067766-A - METHOD AND SYSTEM OF INTELLIGENT DOCUMENT DATA MANAGEMENT BASED ON REAL TIME INCREMENTAL VECTORIZATION

KR20260067766AKR 20260067766 AKR20260067766 AKR 20260067766AKR-20260067766-A

Abstract

The present disclosure relates to an intelligent document data management method and system based on real-time incremental vectorization, wherein a change detection engine detects a changed part of an input document, an optional vectorization processor vectorizes the changed part to generate a new vector, a vector integration manager merges the new vector with a pre-stored document vector for the document to generate an updated document vector, a real-time indexing system indexes the updated document vector into a searchable form through incremental indexing that adds or modifies an index for the new vector, and a version control optimizer is configured to store the difference between the stored document vector and the updated document vector.

Inventors

  • 양진홍

Assignees

  • 인제대학교 산학협력단

Dates

Publication Date
20260513
Application Date
20241106

Claims (13)

  1. In a computing system for intelligent document data management based on real-time incremental vectorization, A change detection engine configured to detect changed parts of an input document; An optional vectorization processor configured to vectorize the above-mentioned modified parts to generate a new vector; and A vector integration manager configured to generate an updated document vector by merging the new vector with a pre-stored document vector for the above document. including, Computing system.
  2. In Article 1, A real-time indexing system configured to index the updated document vector into a searchable form through incremental indexing that adds or modifies an index for the new vector. including more, Computing system.
  3. In Article 1, A version control optimizer configured to enable the restoration of the updated document vector using the difference by storing the difference between the stored document vector and the updated document vector. including, Computing system.
  4. In Article 1, A large-scale language model integration module configured to select a language model based on the above document and transmit analysis results derived through context analysis of the above document using the language model to the selective vectorization processor, so that the selective vectorization processor vectorizes the changed parts using the analysis results. including more, Computing system.
  5. In Article 1, A distributed computing manager configured to break down and manage processing tasks for the above document across multiple computing nodes and to aggregate the work results of the computing nodes; A blockchain-based version control system configured to manage the change history of the above document; An edge computing interface configured to support the processing and updating of the above documents on an edge device; or An advanced search and analysis engine configured to provide advanced search and analysis capabilities based on the above-mentioned updated document vectors including at least one more of, Computing system.
  6. In Article 1, The above change detection engine is, A document splitter configured to divide the above document into multiple units; A hash generator configured to generate a hash value for each of the above units; and A hash comparator configured to identify the changed parts by comparing the unit-specific hash value stored in advance for the above document with the generated hash value. including, Computing system.
  7. In Article 1, The above-mentioned selective vectorization processor is, A vector generator configured to generate the new vector by vectorizing the changed part based on the surrounding context of the changed part in the above document; and A quality verifier configured to verify the quality of the new vector and, if the quality is below a predetermined threshold, to correct the new vector. including, Computing system.
  8. In Article 1, The above vector integration manager, A vector aligner configured to align the new vector with respect to the stored document vector; A weight calculator configured to calculate the weight of each vector element of the new vector; and A vector merger configured to merge the new vector into the stored document vector based on the above weights including, Computing system.
  9. In a method of operation of a computing system for intelligent document data management based on real-time incremental vectorization, The above computing system includes a change detection engine, an optional vectorization processor, and a vector integration manager, and The above method of operation of the computing system is, A step in which the above change detection engine detects a changed part of an input document; The step of the above-mentioned selective vectorization processor vectorizing the above-mentioned modified part to generate a new vector; and The step of the vector integration manager merging the new vector into the pre-stored document vector for the document to generate an updated document vector. including, Method of operation of a computing system.
  10. In Article 9, The above computing system further includes a real-time indexing system, and The above method of operation of the computing system is, The above real-time indexing system indexes the updated document vector into a searchable form through incremental indexing that adds or modifies an index for the new vector. including, Method of operation of a computing system.
  11. In Article 9, The above computing system further includes a version management optimizer, and The above method of operation of the computing system is, The above version management optimizer stores the difference between the above stored document vector and the above updated document vector. Includes more, The above difference is used for the restoration of the above-mentioned updated document vector, Method of operation of a computing system.
  12. In Article 9, The above computing system further includes a large-scale language model linkage module, and The above method of operation of the computing system is, The above-mentioned large-scale language model integration module selects a language model based on the document and transmits the analysis result derived through context analysis of the document using the language model to the above-mentioned selective vectorization processor. Includes more, The above selective vectorization processor vectorizes the above modified part using the above analysis result, Method of operation of a computing system.
  13. In Article 9, The above computing system is, A distributed computing manager configured to break down and manage processing tasks for the above document across multiple computing nodes and to aggregate the work results of the computing nodes; A blockchain-based version control system configured to manage the change history of the above document; An edge computing interface configured to support the processing and updating of the above documents on an edge device; or An advanced search and analysis engine configured to provide advanced search and analysis capabilities based on the above-mentioned updated document vectors including at least one more of, Method of operation of a computing system.

Description

Method and System of Intelligent Document Data Management Based on Real-Time Incremental Vectorization The present disclosure relates to an intelligent document data management method and system based on real-time incremental vectorization. Generally, there are technologies available for managing document data. However, these technologies have various technical limitations that lead to diverse economic impacts. The first is that even if only a part of a document is changed, the entire document must be re-vectorized. This increases costs through unnecessary consumption of computing resources and causes delays in processing time. The second is the lack of an efficient comparison mechanism between existing and new vectors. This requires additional time and resources to identify changed parts and reduces productivity. The third is that it is difficult to selectively update only the changed parts when processing large volumes of documents. This increases the operating costs of large-scale document management systems and limits system scalability. The fourth is that it is difficult to update document and vector databases in real time when documents are modified. This delay in reflecting the latest information reduces decision support and search accuracy. The fifth is that it is difficult to efficiently manage vector representations for multiple versions of a document. This leads to wasted storage space due to complex version control and increases system complexity. FIG. 1 is a block diagram schematically illustrating a computing system for real-time incremental vectorization-based intelligent document data management according to the present disclosure. Figure 2 is a block diagram illustrating the change detection engine of Figure 1 in detail. Figure 3 is a block diagram illustrating the optional vectorization processor of Figure 1 in detail. Figure 4 is a block diagram illustrating the vector integration manager of Figure 1 in detail. Figure 5 is a block diagram illustrating the Real-time Indexing System (RIS) of Figure 1 in detail. Figure 6 is a block diagram illustrating the Version Management Optimizer (VMO) of Figure 1 in detail. FIG. 7 is a flowchart illustrating the operation method of a computing system for real-time incremental vectorization-based intelligent document data management according to the present disclosure. FIG. 8 is a flowchart illustrating the operation method of a computing system for real-time incremental vectorization-based intelligent document data management according to the present disclosure. FIG. 9 is a structural diagram illustrating the system architecture of a computing system according to the present disclosure. FIG. 10 is a structural diagram illustrating a cloud architecture for a computing system when the computing system according to the present disclosure is operated in a cloud environment. FIG. 11 is a structural diagram illustrating a hybrid cloud architecture for a computing system when the computing system according to the present disclosure is operated in an on-premises environment and a cloud environment. In the following, the present disclosure provides an intelligent document data management method and system based on real-time incremental vectorization. The purpose of the present disclosure is to enable efficient management and real-time updating of large-scale document collections, thereby improving the timeliness, accuracy, and accessibility of information and enhancing the overall performance of document processing and search systems. Specifically, the present disclosure may be implemented to achieve various objectives. The first objective is to address the inefficiency of document updates. Generally, there is a problem in that the entire document must be re-vectorized even if only a part of it is changed. Therefore, this disclosure aims to maximize processing efficiency by selectively vectorizing only the changed parts. This significantly reduces processing time and computing resource usage, thereby improving the overall performance of the system. The second objective is to overcome the difficulties of real-time updates and searching. Generally, there is a problem in that it is difficult to reflect document changes in search results in real time. Therefore, the present disclosure utilizes incremental indexing technology to immediately reflect only the changed parts in the index. This ensures the timeliness and accuracy of information by enabling access to the latest information at all times. The third objective is to reduce the complexity of managing large-scale document collections. Generally, it is difficult to efficiently manage and update large volumes of documents. Therefore, this disclosure aims to ensure scalability by introducing a modular system structure and distributed processing technology. This allows for flexible adaptation to increases in the number of documents, thereby maintaining system stability and performance. The fourth aspect is optimizi