CN-121281083-B - Document information input method, device and storage medium

CN121281083BCN 121281083 BCN121281083 BCN 121281083BCN-121281083-B

Abstract

The application discloses a document information input method, a device and a storage medium, and relates to the technical field of electronic digital data processing; the method comprises the steps of calling a large language model to extract an entity in the character, associating an entity relation, a business intention, a context and a context relation corresponding to the entity as reference information of the entity, constructing an extraction problem based on a field in a business data structure file corresponding to the document type, and extracting a target field from the entity through the extraction problem by combining the reference information. The application can realize the technical effect of efficient and accurate conversion from fragmented characters to the available structured data of the service system.

Inventors

ZHONG YU
Wen Xinzhe
WANG LEI
CHEN XIUQIONG
XIONG LIYANG

Assignees

深圳市明源云科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251209

Claims (7)

1. The document information input method is characterized by comprising the following steps of: screening target characters in characters obtained by identifying the document to be processed through preset rules; Determining a document type according to the word meaning of the target character, wherein the document type is at least one of a contract, a bill, a report and a resume; Invoking a large language model to extract an entity in the character, and associating an entity relationship, a business intention, a context and a context corresponding to the entity as reference information of the entity; Constructing an extraction problem based on fields in a business data structure file corresponding to the document type; Extracting a target field from the entity through the extraction problem in combination with the reference information; The step of constructing and extracting the problem based on the field in the business data structure file corresponding to the document type comprises the following steps: Constructing a knowledge graph matched with the document type; matching each field in the business data structure file with entity attributes in the knowledge graph through a field semantic matching algorithm to generate mapping information; according to the semantic description and the association rule of the mapping information supplementary field, updating the service data structure file is completed; Analyzing the updated business data structure file, extracting the mapping information of the name, semantic description and knowledge graph of each field, constructing an extraction problem by adopting rules of query word, field semantic description and semantic association constraint; the step of extracting the target field from the entity by the extraction problem in combination with the reference information includes: the entity is used as a corpus, and answer fields corresponding to the extracted questions are retrieved; Screening the answer fields according to the reference information and the mapping information, and determining the target fields with the similarity to the extracted questions being greater than a first threshold; determining an API interface corresponding to the target field; Constructing a request message according to the target field and the parameter requirements of the API interface; calling the API interface to send the request message, and analyzing a returned result to obtain a supplementary field; the supplemental field is added as the target field.
2. The document information entry method of claim 1, wherein the step of invoking a large language model to extract an entity in the character and associating an entity relationship, a business intention, a context, and a context corresponding to the entity as reference information of the entity comprises: extracting each entity in the character and the entity relation among the entities; Analyzing the sentence structure of the sentence in which the entity is located, paragraph logic of the paragraph in which the entity is located, and the overall structure of the document to be processed, and constructing the context of the entity; invoking a history text corresponding to the document type to identify a business intention corresponding to the entity; Determining a context of the entity to the paragraph based on the context; and associating the entity relationship, the business intention, the context and the context corresponding to the entity as the reference information of the entity.
3. The document information entry method of claim 1, wherein the step of extracting a target field from the entity through the extraction problem in combination with the reference information comprises: the entity is used as a corpus, and answer fields corresponding to the extracted questions are retrieved; and screening the answer fields according to the reference information, and determining the target fields with the similarity to the extracted questions being greater than a second threshold.
4. The document information entry method according to claim 1, wherein after the step of extracting a target field from the entity by the extraction problem in combination with the reference information, comprising: checking the target field according to the service data structure specification associated with the service data structure file; and if the target field is not checked, the step of combining the reference information and extracting the target field from the entity through the extraction problem is executed for the rest of the entities.
5. The document information entry method according to claim 1 or 4, wherein after the step of extracting a target field from the entity by the extraction problem in combination with the reference information, comprising: assembling the target field into structured data according to the format specified by the service data structure file; Displaying the structured form generated by the structured data on a display interface; Determining a corresponding second field of a first field in the document to be processed according to clicking operation on the first field in the structured form; Highlighting the second field.
6. A document information entry device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the document information entry method of any one of claims 1 to 5.
7. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the document information entry method according to any one of claims 1 to 5.

Description

Document information input method, device and storage medium Technical Field The present application relates to the field of electronic digital data processing technologies, and in particular, to a method, an apparatus, and a storage medium for recording document information. Background In business systems, there are a large number of unstructured documents such as contracts, notes, reports, resume, etc. At present, the document is firstly subjected to text extraction through an optical character recognition tool, and then is manually input according to the extracted fields. The characters extracted by the optical character recognition tool are fragmented and cannot be precisely matched with specific fields of a business system. Disclosure of Invention The application mainly aims to provide a document information input method, device and storage medium, and aims to solve the technical problem that characters extracted by using an optical character recognition tool are fragmented and cannot be accurately matched with specific fields of a service system. In order to achieve the above object, the present application provides a document information input method, including: Determining the type of the document according to the characters obtained by recognition for the document to be processed; Invoking a large language model to extract an entity in the character, and associating an entity relationship, a business intention, a context and a context corresponding to the entity as reference information of the entity; Constructing an extraction problem based on fields in a business data structure file corresponding to the document type; and extracting a target field from the entity through the extraction problem in combination with the reference information. In one embodiment, the step of determining the document type according to the characters identified for the document to be processed includes: screening target characters in characters obtained by identifying the document to be processed through preset rules; and determining the document type according to the word meaning of the target character, wherein the document type is at least one of contract, bill, report and resume. In an embodiment, the step of calling the large language model to extract the entity in the character and associating the entity relationship, the business intention, the context and the context corresponding to the entity as the reference information of the entity includes: extracting each entity in the character and the entity relation among the entities; Analyzing the sentence structure of the sentence in which the entity is located, paragraph logic of the paragraph in which the entity is located, and the overall structure of the document to be processed, and constructing the context of the entity; invoking a history text corresponding to the document type to identify a business intention corresponding to the entity; Determining a context of the entity to the paragraph based on the context; and associating the entity relationship, the business intention, the context and the context corresponding to the entity as the reference information of the entity. In one embodiment, the step of constructing the extraction problem based on the fields in the service data structure file corresponding to the document type includes: mapping the business data structure file corresponding to the document type to a knowledge graph, and updating the business data structure file according to mapping information; Constructing an extraction problem based on the updated fields in the service data structure file; the step of extracting the target field from the entity by the extraction problem in combination with the reference information includes: the entity is used as a corpus, and answer fields corresponding to the extracted questions are retrieved; And screening the answer fields according to the reference information and the mapping information, and determining the target fields with the similarity to the extracted questions being greater than a first threshold. In an embodiment, the step of extracting the target field from the entity by the extraction problem in combination with the reference information includes: the entity is used as a corpus, and answer fields corresponding to the extracted questions are retrieved; and screening the answer fields according to the reference information, and determining the target fields with the similarity to the extracted questions being greater than a second threshold. In an embodiment, after the step of extracting the target field from the entity by the extraction problem in combination with the reference information, the method includes: checking the target field according to the service data structure specification associated with the service data structure file; and if the target field is not checked, the step of combining the reference information and extracting the target field from the entity through the extracti