CN-121982739-A - Intelligent bill processing method and system based on LLM fusion

CN121982739ACN 121982739 ACN121982739 ACN 121982739ACN-121982739-A

Abstract

The invention discloses an intelligent bill processing method and system based on LLM fusion, and belongs to the technical field of data analysis. According to the invention, the received electronic document is split into the page images, then the page images are subjected to page analysis comprising visual layout analysis and text classification in parallel, the page images are segmented based on the results of the page analysis, different content areas in the page can be more accurately identified, so that document segmentation is obtained, a main engine and a standby engine are selected according to the types of the document segmentation, finally the main engine and the standby engine are called to extract the content of the document segmentation, a first extraction result and a second extraction result are obtained, the final structural data of the electronic document is determined after the first extraction result and the second extraction result are compared, so that the extraction result is optimized, the reliability of content extraction is improved, the processing reliability of intelligent notes is improved, and the problem of low processing reliability of the intelligent notes caused by the risk of data consistency in the prior art is solved.

Inventors

LIU HONGLIANG
HUANG TAO
LIANG JUNFENG
ZHENG BINGWEI
WU YI
SONG JIAMEI
WANG ZHIWEI
ZENG XUE
Yao Ruibin
LIANG QIUYUN

Assignees

利华设计院（深圳）有限公司

Dates

Publication Date: 20260505
Application Date: 20260115

Claims (10)

1. The intelligent bill processing method based on LLM fusion is characterized by comprising the following steps: s1, after receiving an electronic document, splitting the electronic document into page images according to pages, and attaching page metadata to each page image; s2, carrying out page analysis comprising visual layout analysis and text classification on each page image in parallel, slicing the page image based on the result of the page analysis to obtain document slices, and selecting a main engine and a standby engine according to the type of the document slices; S3, calling the selected main engine to extract the content of the document fragments to obtain a first extraction result, judging whether to call a standby engine to extract the content of the document fragments to obtain a second extraction result, if so, comparing the first extraction result with the second extraction result, and then determining final structured data of the electronic document, otherwise, directly taking the first extraction result as the final structured data.
2. The intelligent bill processing method based on LLM fusion of claim 1, wherein the method is characterized in that the method comprises the following steps of: If at least two title visual elements with preliminary classification confidence coefficient larger than the upper limit of the confidence coefficient are detected in the current page image, and text contents of the title visual elements are assigned to at least two different document types, cutting the page image, and forming independent document fragments in each region; if the preliminary classification confidence coefficient of the current page image is smaller than the average confidence coefficient and the current page and the adjacent page meet the preset text continuity condition, merging the current page and the adjacent page into a document fragment; if the current page image does not meet the conditions and the preliminary classification confidence is smaller than the lower confidence limit, marking the document type of the current page image as abnormal and prompting preset personnel; If the current page image does not meet the conditions and the preliminary classification confidence is not less than the lower confidence limit, marking the current page image as independent document fragments; The text consistency condition is that continuous page number information is contained in the text of the current page and the text of the adjacent page and/or the TF-IDF vector cosine similarity of the text of the current page and the text of the adjacent page is larger than a similarity limit value.
3. The intelligent bill processing method based on LLM fusion of claim 1, wherein the specific flow of selecting a main engine and a standby engine according to the type of document fragmentation is as follows: If the document fragment type is the invoice of the supplier to be processed, the first type large language model is preferentially selected as a main engine, and the second type large language model is selected as a standby engine; If the document fragment type is a prepayment request, the second type large language model is preferentially selected as a main engine, and the first type large language model is used as a standby engine; The first class of large language models has strong visual understanding characteristics, and the second class of large language models has quick response characteristics.
4. The intelligent bill processing method based on LLM fusion of claim 1, wherein the specific flow of selecting a main engine and a standby engine according to the type of document fragmentation is as follows: If the document fragment type is the invoice of the supplier to be processed, dynamically selecting a main engine and a standby engine through the complexity of the document format, and if the document fragment type is a pre-payment request, preferentially selecting a second type large language model as the main engine and a first type large language model as the standby engine; The document format complexity comprises image definition and form complexity; the main engine and the standby engine are dynamically selected according to the complexity of the document format, and the specific flow is as follows: if the image definition is smaller than the definition limit value, a first class large language model is preferentially selected as a main engine, a third class large language model is used as a standby engine, and the third class large language model has a form identification characteristic; if the image definition is not smaller than the definition limit value and the form complexity is larger than the form complexity limit value, the third type large language model is preferentially selected as a main engine, and the first type large language model is used as a standby engine; if the image definition is not smaller than the definition limit and the form complexity is not larger than the form complexity limit, the first type large language model is preferentially selected as a main engine, and the second type large language model is preferentially selected as a standby engine.
5. The intelligent bill processing method based on LLM fusion of claim 1, wherein the specific flow of selecting a main engine and a standby engine according to the type of document fragmentation is as follows: recording historical processing success rate and response time of each processing engine under different document types and document layout complexity; performing weighted coupling processing on the historical processing success rate and response time of each processing engine under the complexity of the current document type and document format to obtain an engine performance judgment value; And sequencing the processing engines according to the engine performance judgment value, and selecting the highest-sequenced engine as a main engine and the next highest-sequenced engine as a standby engine.
6. The intelligent bill handling method based on LLM fusion according to any one of claims 3 to 5, further comprising: Detecting whether the document fragment contains handwriting or a seal; if the handwriting is included, additionally calling a handwriting recognition engine as an auxiliary processing path; if the seal is contained, additionally calling an optical seal identification engine as an auxiliary processing path; If the handwriting and the seal are contained, the handwriting recognition engine and the optical seal recognition engine are independently called to serve as auxiliary processing paths.
7. The intelligent bill processing method based on LLM fusion of claim 1, wherein the final structured data of the electronic document is determined after comparing the first extraction result and the second extraction result, and the specific flow is as follows: if the extraction content of the same field in the first extraction result is consistent with the extraction content of the same field in the second extraction result, namely, the similarity of the extraction content of the same field in the first extraction result and the second extraction result is larger than the similarity threshold value of the extraction content, directly adopting the result as a final field; if the extraction content of the same field in the first extraction result and the second extraction result is inconsistent, namely the similarity of the extraction content of the same field in the first extraction result and the second extraction result is not greater than the extraction content similarity threshold value, selecting a result with high confidence as a final field; Performing weighted coupling processing according to the confidence coefficient corresponding to each final field and the historical processing success rate corresponding to the used processing engine to obtain comprehensive confidence coefficient; if the comprehensive confidence coefficient is smaller than the comprehensive confidence coefficient limit value, judging that the field is abnormal, prompting a preset person, and otherwise, directly integrating based on the obtained final field to obtain final structured data.
8. The intelligent bill processing method based on LLM fusion of claim 1, wherein the judging whether to call a standby engine to extract the content of the document fragment to obtain a second extraction result comprises the following specific procedures: If the merging cell ratio is larger than the merging cell ratio limit value, the main engine and the standby engine are called in parallel; and executing a passive trigger mechanism, namely automatically switching to a standby engine for processing if the calling of the main engine fails or overtime.
9. The intelligent bill processing method based on LLM fusion according to claim 1, wherein the visual layout analysis is specifically as follows: detecting visual elements in the page image by utilizing a pre-trained visual model, and recording the types, the boundary boxes and the preliminary classification confidence of each visual element; The page analysis of the text classification comprises the steps of extracting text contents on the page image by adopting optical character recognition, classifying the text contents by utilizing a text classification model, and predicting the document type and the confidence.
10. The system for applying the intelligent bill processing method based on LLM fusion according to any one of claims 1-9, which is characterized by comprising a bill document splitting module, an engine selecting module and a bill content extracting module; the bill document splitting module is used for splitting the electronic document into page images according to pages after receiving the electronic document, and attaching page metadata to each page image; The engine selection module is used for carrying out page analysis comprising visual layout analysis and text classification on each page image in parallel, segmenting the page image based on the result of the page analysis to obtain document segmentation, and selecting a main engine and a standby engine according to the type of the document segmentation; And the bill content extraction module is used for calling the selected main engine to extract the content of the document fragment to obtain a first extraction result, judging whether to call a standby engine to extract the content of the document fragment to obtain a second extraction result, if yes, comparing the first extraction result with the second extraction result, and then determining final structured data of the electronic document, otherwise, directly taking the first extraction result as the final structured data.

Description

Intelligent bill processing method and system based on LLM fusion Technical Field The invention relates to the technical field of data analysis, in particular to an intelligent bill processing method and system based on LLM fusion. Background In the existing invoice and contract information processing field, particularly for the management of overseas non-standardized notes, the management mainly depends on the traditional optical character recognition (Optical Character Recognition, OCR) technology and a data extraction method based on a fixed template. The OCR technology can effectively identify the print character from the document through the steps of image preprocessing, character positioning, feature extraction, pattern recognition and the like, and output the print character as plain text or structured text data with position information. An extraction template is then manually configured for each known document type with standard format (e.g., a vendor-specific invoice template), and finally target fields, such as invoice number, date, amount, vendor name, etc., are accurately captured and populated from OCR output text according to rules preset in the template. In order to improve processing efficiency and reduce labor cost, relying on large language models (Large Language Model, LLM) to automatically extract key information from these electronic documents has become an inevitable trend in industry development. Such as the OpenAI company GPT series model or the alicloud universal series model. The models have strong natural language understanding and generating capability, can analyze document contents through carefully designed prompt words, and output extracted structured information according to a preset format. The extracted structured information is mapped and converted into a unified and standard data structure in the system through an adapter mode. Finally, the standardized structured data is output to a downstream business system, such as an enterprise resource planning (ENTERPRISE RESOURCE PLANNING, ERP) system, by means of an application program interface (Application Programming Interface, API) or a message queue, for subsequent automated business processes, such as financial accounting, data analysis, etc. A bill data analysis method and system based on a multi-mode big model is disclosed in Chinese patent publication No. CN118552331B, and comprises the steps of communicating with related stakeholders, determining project requirements and targets, making project plans including timetables, resource allocation and key milestones, collecting data, identifying data sources including paper bills and electronic bills, establishing a data collection pipeline, acquiring bill data through an automation tool, ensuring compliance and safety of data collection, preprocessing the data, cleaning the collected bill data, processing missing values and noise data, and formatting the data to ensure a uniform input format. The automatic office method, the device, the terminal equipment and the storage medium based on the artificial intelligence are disclosed in China patent application No. 120996957A, and comprise the steps of inputting an image of a bill to be identified into a pre-built neural network model to extract bill information to obtain key word information, wherein the neural network model is obtained by training based on a CNN neural network and a Transformer neural network and combining a plurality of different types of bill images and key word information corresponding to each bill image, performing similarity matching in a pre-built bill template database according to the key word information, calling a corresponding bill template according to a similarity matching result, generating an initial bill based on the bill template, checking the initial bill based on preset checking, calling a corresponding bill output rule according to the checking result, and generating a target bill of the bill to be identified based on the bill output rule and the initial bill. The above technology has at least the following technical problems: In the prior art, the traditional OCR technology is seriously dependent on a fixed layout structure and a keyword position, while overseas invoices are different from country to country, from region to region and from enterprise to enterprise, lack of unified specification, and have complex and changeable layout. The existing system is difficult to effectively adapt to the highly non-standardized bill format, so that the information extraction accuracy is obviously reduced. Existing schemes typically rely on a single information extraction model or engine. When the model service fails, the response is overtime or a specific complex format cannot be analyzed, the whole processing flow is interrupted, the fault tolerance capability of automatic switching to a standby scheme is lacked, and the availability of the system and the service continuity are seriously affected.