CN-122022741-A - Automatic document and database verification method and system based on visual language model

CN122022741ACN 122022741 ACN122022741 ACN 122022741ACN-122022741-A

Abstract

The invention discloses a method and a system for automatically checking a document and a database based on a visual language model, and relates to the technical field of data processing, wherein the method comprises the following steps of executing preprocessing operation on document images to be compared and outputting a standardized preprocessed document; the method comprises the steps of receiving a document, analyzing the document after preprocessing, outputting structured data containing layout, category and text content, constructing a key information rule engine, automatically extracting, checking and formatting key fields from the structured data to form a first standardized data set, extracting fields corresponding to the key fields of the first standardized data set from a service database to form a second standardized data set, and carrying out automatic differential comparison on the first standardized data set and the second standardized data set by taking preset key fields as a main key, and outputting a differential report. The invention improves the efficiency of batch comparison and provides data quality guarantee for key businesses such as natural resource administrative approval and the like.

Inventors

ZHANG JINSHENG
LUO LEI
LI ZHANJUN
GONG XIAN
WANG LINGRONG
FENG JINGYI
LU JINGHONG
JIANG TAO
YAN JIE
XIANG JUNHUA
ZHANG YU
ZHANG YI

Assignees

贵州省第二测绘院

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. The automatic checking method for the document and the database based on the visual language model is characterized by comprising the following steps: step S1, preprocessing operation is carried out on document images to be compared, wherein the preprocessing operation comprises self-adaptive denoising, black edge cutting, inclination correction and uniform format, and standardized preprocessing documents are output; S2, calling a visual language model to analyze the preprocessed document, and outputting structured data containing layout, category and text content; Step S3, constructing a key information rule engine, automatically extracting, checking and formatting key fields from the structured data to form a first standardized data set; S4, extracting fields corresponding to key fields of the first standardized data set from the service database to form a second standardized data set aligned with the first standardized data set structure; And S5, automatically comparing the first standardized data set with the second standardized data set by taking a preset key field as a main key, and outputting a difference report.
2. The method according to claim 1, wherein said step S1 comprises the steps of: S1.1, denoising the document images to be compared by adopting a self-adaptive weighted median filtering algorithm, dynamically adjusting the filtering strength according to local statistical characteristics, and retaining edge details while removing noise; S1.2, detecting a document boundary by adopting a self-adaptive threshold algorithm based on gradient statistics, determining a separation threshold value of a document content area and a background area by calculating an image gradient amplitude histogram, generating a binarization mask, and then extracting the content area for cutting; s1.3, adopting a multi-scale layering detection strategy, rapidly estimating an inclination angle range at a low resolution layer, accurately calculating an inclination angle at a high resolution layer, detecting a straight line in a document through Hough transformation, calculating a document inclination angle according to the detected horizontal straight line angle, and reversely counteracting the inclination angle through rotation operation in affine transformation to enable the document to return to the horizontal; And S1.4, merging and converting the image sequence subjected to self-adaptive denoising, black edge cutting and inclination correction into a standard PDF format document for output.
3. The method according to claim 2, wherein in step S1.1, the specific implementation of the adaptive weighted median filtering algorithm includes: for a neighborhood window taking a pixel point (x, y) as a center, calculating an adaptive weight coefficient of each pixel point in the window, wherein the adaptive weight coefficient is inversely proportional to the degree of deviation of a pixel value from a window mean value; and carrying out weighted average on the pixel values in the window based on the self-adaptive weight coefficient to obtain the denoised pixel values.
4. A method according to claim 3, wherein said step S2 comprises: Calling an API (application program interface) of a visual language model, and configuring model operation parameters including server addresses, ports, model names, operation modes, sampling strategy parameters, maximum output token numbers, processing thread numbers, image resolution and pixel limits; Selecting a prompt word mode according to the recognition requirement, wherein the prompt word mode comprises a complete layout analysis mode, a layout detection only mode and an OCR only mode; Loading a visual language model optimized for a document, wherein the model is based on a visual language large model architecture improvement, and comprises a visual encoder, a feature fusion structure and a language decoder, and the analysis capability of the complex document structure is enhanced through multi-scale feature fusion; And converting the input PDF document image into a visual feature vector and a text feature vector through a forward reasoning process of the model, and fusing to generate JSON format data containing the full structured information of the document, wherein the JSON format data contains layout elements of each page and positions, categories and text contents of the layout elements.
5. The method according to claim 4, wherein the step S3 specifically comprises the steps of: s3.1, constructing a rule engine, wherein a rule base comprises extraction rules, check rules and formatting rules; s3.2, extracting key fields from the structured data by adopting a context-aware extraction strategy and combining regular expression matching, semantic context and space layout information; S3.3, performing business logic verification on the extracted key information, wherein the business logic verification comprises date logic verification, numerical range verification, format consistency verification and cross field verification; and step S3.4, unifying the formats of the checked fields, including leading zero removal of the numeric field, unified format conversion of the date field, blank character standardization of the character string field, and outputting a first standardized data set.
6. The method according to claim 5, wherein the step S4 specifically comprises the steps of: S4.1, establishing connection with a service database, executing multi-table joint query according to a predefined field mapping relation, and extracting database fields corresponding to key fields of a first standardized data set; s4.2, aligning the extracted database fields by adopting a field mapping table to ensure that the semantic correspondence with the first standardized data set is correct; and S4.3, cleaning and formatting the extracted database data, wherein the steps comprise unified date format, aligned numerical precision and character string code conversion, so that the extracted database data has the same data type and format specification as those of the first standardized data set, and outputting a second standardized data set.
7. The method according to claim 6, wherein the step S5 specifically comprises the steps of: S5.1, identifying a common record set, a first data set unique record set and a second data set unique record set by taking a preset key field as a main key; s5.2, for each record in the common record set, carrying out field level difference detection by adopting an improved Myers difference algorithm, and calculating the difference degree of each field; s5.3, for each record, calculating an overall consistency score according to the field weight and the field difference degree; And S5.4, generating a visual report containing a record level comparison overview, field level difference details and consistency score distribution, and highlighting the difference fields.
8. The method according to claim 7, wherein in the step S5.2, the degree of difference of the fields is calculated by calculating the degree of difference based on the longest common subsequence for the string-type field, comparing with a preset tolerance threshold for the numeric-type field, and considering that the absolute difference of the two values is completely consistent when the ratio of the absolute difference of the two values to the larger value is smaller than or equal to the tolerance threshold, otherwise, calculating the degree of difference proportionally.
9. A visual language model based document and database automated verification system for implementing a visual language model based document and database automated verification method as claimed in any one of claims 1 to 8, comprising: the preprocessing module is used for executing self-adaptive weighted median filtering denoising, self-adaptive threshold black edge detection and cutting, multi-scale Hough transform inclination correction and format unified conversion on the document images to be compared and outputting a standardized preprocessing document; The visual language model recognition module is used for calling an API interface of the visual language model, selecting a prompt word mode according to recognition requirements, and converting an input PDF document image into JSON format data containing layout element positions, categories and text contents through model forward reasoning; the rule engine module is used for constructing a key information rule engine, automatically extracting, checking and formatting key fields from the structured data to form a first standardized data set; The database docking module is used for extracting fields corresponding to the key fields of the first standardized data set from the service database, performing field alignment and mapping, and data cleaning and formatting to form a second standardized data set aligned with the first standardized data set structure; And the difference comparison module is used for carrying out difference comparison on the first standardized data set and the second standardized data set by taking a preset key field as a main key, detecting field level differences by adopting an improved Myers difference algorithm, calculating consistency scores and outputting a visualized report of the high brightness differences.
10. The system of claim 9, wherein the visual language model recognition module comprises: the visual encoder is used for dividing an input image into image blocks and mapping the image blocks into visual feature sequences based on VisionTransformer architecture, and realizing multi-scale feature fusion through a feature fusion structure; The language decoder is used for generating structured data containing layout information and text content in an autoregressive mode by taking a visual characteristic sequence as input based on a large language model framework; The visual encoder and the language decoder are spliced through the alignment module, and the visual feature vector and the text feature vector are obtained through a feature embedding fusion method and fused, so that multi-mode large model reasoning is realized.

Description

Automatic document and database verification method and system based on visual language model Technical Field The invention relates to the technical field of data processing, in particular to a method and a system for automatically checking documents and databases based on a visual language model. Background In the administrative approval management business of natural resources, the comparison of scanned parts of signed documents such as rural land contractual management contracts and the like with database reference data is an important risk link related to legal regulations, such as verifying whether the contractor name, the identification card number, the contractor period, the land four-to-name, the land number, the land code and the land area in the contract are consistent with corresponding attribute contents in a registration result database, so as to avoid legal liability risks and rights and interests disputes caused by inconsistent data. However, the conventional technology has the defects that 1, the conventional OCR technology has insufficient recognition precision in terms of realizing automatic verification of contracts and databases, and the conventional OCR technology generally breaks down a flow into independent modules such as character detection, character recognition, post-processing and the like, and the modules are loosely coupled and easily generate error accumulation. Under the conditions of complex background, distortion, low-quality scanning, mixed document and the like, the traditional OCR performance is rapidly reduced, the layout structure, the reading sequence recovery, the recognition table structure and the document semantic hierarchy are difficult to understand, 2, key information extraction efficiency is low, traditional OCR output is mostly unstructured text, key information such as contract numbers, identity card numbers, contract validity periods and the like is required to be screened manually, single contract information extraction average processing time exceeds 10 minutes, efficiency is extremely low during batch processing, subjective errors are easy to introduce during manual extraction, 3, data conversion is non-standardized, OCR results are incompatible with database table structures, manual comparison is required to be carried out after Excel or CSV is required, field types are not matched, data formats are inconsistent and the like, so that manual transcription errors are frequently generated, 4, data comparison dimension is single, the existing tool only supports full text comparison or line level comparison, directional semantic comparison cannot be carried out on contract key fields, difference positioning is dependent on manual line-by-line verification, efficiency is low, and key differences are easy to miss. Therefore, a full-link automation solution capable of integrating high-precision structured recognition automation key information extraction and standardized database orientation comparison has not yet emerged in the prior art, and a new automation technology means is needed to be fused to make up for the defects of the prior art. Disclosure of Invention The invention aims to solve the technical problem of providing a method and a system for automatically checking documents and databases based on a visual language model aiming at the defects of the prior art. In order to achieve the above purpose, the invention adopts the following technical scheme: The automatic document and database verification method based on the visual language model comprises the following steps: step S1, preprocessing operation is carried out on document images to be compared, wherein the preprocessing operation comprises self-adaptive denoising, black edge cutting, inclination correction and uniform format, and standardized preprocessing documents are output; S2, calling a visual language model to analyze the preprocessed document, and outputting structured data containing layout, category and text content; Step S3, constructing a key information rule engine, automatically extracting, checking and formatting key fields from the structured data to form a first standardized data set; S4, extracting fields corresponding to key fields of the first standardized data set from the service database to form a second standardized data set aligned with the first standardized data set structure; And S5, automatically comparing the first standardized data set with the second standardized data set by taking a preset key field as a main key, and outputting a difference report. Further, the step S1 specifically includes the following steps: S1.1, denoising the document images to be compared by adopting a self-adaptive weighted median filtering algorithm, dynamically adjusting the filtering strength according to local statistical characteristics, and retaining edge details while removing noise; S1.2, detecting a document boundary by adopting a self-adaptive threshold algorithm based on gradi