CN-122024264-A - Document data entry method, electronic device, storage medium, and program product

CN122024264ACN 122024264 ACN122024264 ACN 122024264ACN-122024264-A

Abstract

The application discloses a document data input method, electronic equipment, a storage medium and a program product, which relate to the technical field of document identification and comprise the steps of obtaining a document to be identified, wherein the document to be identified comprises at least one page to be identified; the method comprises the steps of obtaining a first recognition result and a target page by recognizing a document to be recognized through an optical character recognition technology, wherein the target page is the page to be recognized containing non-text elements, obtaining a second recognition result by recognizing the target page through a target large language model, and inputting the first recognition result and the second recognition result into the target form according to the similarity between a form field of a preset target form and the first recognition result and the second recognition result to obtain an input target form. According to the application, through the cooperative work of OCR and a large language model, the synchronous and efficient identification of text and non-text information is realized, and the accuracy and automation level of document identification and document data input are improved by combining an intelligent matching input mechanism.

Inventors

QIN LONG
Su Quanyu
LIN XIANDONG
CHEN SHIBING
BAO MIN
WANG WEI

Assignees

一临云(宁波)科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260114

Claims (10)

1. A document data entry method, the document data entry method comprising: Acquiring a document to be identified, wherein the document to be identified comprises at least one page to be identified; Identifying the document to be identified through an optical character identification technology to obtain a first identification result and a target page, wherein the target page is the page to be identified containing non-text elements; Identifying the target page through a target large language model to obtain a second identification result; And inputting the first recognition result and the second recognition result into the target form according to the similarity between the form field of the preset target form and the first recognition result and the second recognition result, so as to obtain the input target form.
2. The document data entry method of claim 1, wherein prior to the step of identifying the target page by a target large language model to obtain a second identification result, further comprising: determining a keyword list of the target page according to a first identification result corresponding to the target page; and determining the target large language model according to the keyword list and the mapping relation between the preset keywords and the large language model.
3. The document data entry method of claim 1, wherein the non-text element comprises a histogram, the large language model comprises a visual large language model, and the step of identifying the target page by a target large language model, the step of obtaining a second identification result comprises: determining a target histogram from the target page according to form fields of the target form through the visual large language model; Identifying columnar structures of the target histogram through the visual large language model, and extracting label information of each columnar structure; Extracting numerical information of each columnar structure from a preset surrounding area of each columnar structure through the visual large language model; And associating the label information and the numerical information of each columnar structure to obtain the second identification result.
4. The document data entry method of claim 1, wherein the non-text element comprises a form, and wherein the step of identifying the target page by a target large language model, results in a second identification result comprises: determining a target form from the target page according to form fields of the target form through the target large language model, and identifying an arrangement structure of the target form; According to the arrangement structure, splicing the data of each row/column in the first identification result corresponding to the target table into character strings to obtain table character string data corresponding to each row/column; And analyzing the data of each table character string through the target large language model, and determining the names and the result values of the data items of each row/column to obtain a second recognition result.
5. The document data entry method according to claim 1, wherein the step of entering the first recognition result and the second recognition result into the target form according to the similarity between the form field of the preset target form and the first recognition result and the second recognition result, and obtaining the entered target form comprises: for any form field, calculating the similarity between the form field and each data field in the first recognition result and the second recognition result respectively; and under the condition that the data fields with the similarity larger than a preset similarity threshold exist, determining a target field from the data field with the highest similarity, and mapping the value of the target field into the value of the form field of the target form to obtain the recorded target form.
6. The document data entry method of claim 1, further comprising, after the step of obtaining the second recognition result: Determining semantic entropy of the second recognition result through the target large language model; judging whether the second recognition result accords with preset data logic or not through the target large language model; And under the condition that the semantic entropy is larger than a preset entropy threshold or the second recognition result does not accord with the data logic, adjusting the second recognition result according to the document to be recognized.
7. The document data entry method of claim 1, further comprising, after the step of obtaining the entered target form: intercepting a database writing request aiming at the entered target form through a preset real-time checking interface; Verifying the target form associated with the database writing request through a preset data verification rule in the electronic data capturing system; writing the recorded target form into a preset database under the condition that verification is passed; and outputting alarm information under the condition of verification failure.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the document data entry method of any one of claims 1 to 7.
9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the document data entry method according to any one of claims 1 to 7.
10. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the steps of the document data entry method of any one of claims 1 to 7.

Description

Document data entry method, electronic device, storage medium, and program product Technical Field The present application relates to the field of document identification technologies, and in particular, to a document data entry method, an electronic device, a storage medium, and a program product. Background Along with the continuous development of computer technology, image processing technology and artificial intelligence technology, the technical means of clinical research data acquisition are also continuously updated and iterated, and the development is gradually advanced from the early manual input to the automatic and intelligent directions so as to meet the increasing clinical research data processing demands. At present, the OCR (Optical Character Recognition ) technology commonly adopted in clinical research data acquisition systems generally comprises three basic links of image preprocessing, character recognition and manual data entry. However, the dependency on the document format is strong, the recognition effect is easily affected by factors such as document structure, font style, table layout, background interference and the like, and when the non-standardized or unstructured documents such as laboratory sheets, inspection reports and the like are faced, the entry content in the table and related metering unit information are difficult to accurately divide and recognize, so that the error rate of document data entry is increased. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims to provide a document data input method, electronic equipment, a storage medium and a program product, which aim to solve the technical problem of how to improve the accuracy of document identification and data input. In order to achieve the above object, the present application provides a document data entry method, the method comprising: Acquiring a document to be identified, wherein the document to be identified comprises at least one page to be identified; Identifying the document to be identified through an optical character identification technology to obtain a first identification result and a target page, wherein the target page is the page to be identified containing non-text elements; Identifying the target page through a target large language model to obtain a second identification result; And inputting the first recognition result and the second recognition result into the target form according to the similarity between the form field of the preset target form and the first recognition result and the second recognition result, so as to obtain the input target form. In an embodiment, before the step of identifying the target page through the target large language model to obtain the second identification result, the method further includes: determining a keyword list of the target page according to a first identification result corresponding to the target page; and determining the target large language model according to the keyword list and the mapping relation between the preset keywords and the large language model. In one embodiment, the non-text element includes a histogram, the large language model includes a visual large language model, and the step of identifying the target page by the target large language model to obtain a second identification result includes: determining a target histogram from the target page according to form fields of the target form through the visual large language model; Identifying columnar structures of the target histogram through the visual large language model, and extracting label information of each columnar structure; Extracting numerical information of each columnar structure from a preset surrounding area of each columnar structure through the visual large language model; And associating the label information and the numerical information of each columnar structure to obtain the second identification result. In one embodiment, the non-text element includes a table, and the step of identifying the target page through the target large language model to obtain a second identification result includes: determining a target form from the target page according to form fields of the target form through the target large language model, and identifying an arrangement structure of the target form; According to the arrangement structure, splicing the data of each row/column in the first identification result corresponding to the target table into character strings to obtain table character string data corresponding to each row/column; And analyzing the data of each table character string through the target large language model, and determining the names and the result values of the data items of each row/column to obtain a second recognition result. In an embodim