CN-122024267-A - Automatic identification and filling method for driver bill based on multimode neural network
Abstract
The invention provides a driver bill automatic identification and filling method based on a multimode neural network, which comprises the following steps of S1, S2, bill type model training, S3, predicting newly uploaded bill types, S4, target filling information is extracted from original bill pictures, a label text block is matched with an adjacent text box, various non-standardized bills can be self-adaptively and adaptively identified and information extracted from a disordered format bill, the adaptation capability of various special-shaped bills in a freight scene can be greatly improved, the multimode neural network is adopted, bill classification is carried out from double modes of visual characteristics and text semantic characteristics of the pictures, the problem that the bill types are easy to be confused only by text identification is solved, the identification error of the bill types is reduced, meanwhile, the label value is accurately extracted through target label positioning and position matching and content rule verification, the error of information matching and filling confusion are avoided, and the data accurate input is realized.
Inventors
- MENG JIAN
- LI QIANG
- ZHANG ZHONGLEI
- YANG NUO
- JIANG SHUAI
- LI YUE
- JIN ZHIYU
- WANG YAPING
Assignees
- 鱼快创领智能科技(南京)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260410
Claims (6)
- 1. A method for automatically identifying and filling driver notes based on a multimode neural network is characterized by comprising the following steps: S1, extracting multi-mode characteristics; s11, performing OCR text recognition on the original bill picture to form text information; s12, word segmentation processing is carried out on the text information to obtain a word vector matrix; S13, obtaining text characteristic representation of original pictures of each bill; S14, inputting the original bill picture into a CNN network to form a feature vector; S15, obtaining multi-mode characteristics of original pictures of the bill; s2, training a bill category model; S21, multi-mode feature classification and combination; S22, calculating the correlation degree of the multi-mode features; s23, training the RNN network; S3, predicting the newly uploaded bill category; S4, extracting target filling information from the original bill picture; s41, acquiring corresponding information of a text block and a text box; s42, searching a label text block; s43, acquiring coordinates of a tag text box; S44, searching other nearby text boxes, and obtaining a target text box; and obtaining a text block corresponding to the target text box based on the target text box, obtaining text block content, and filling the text block content into a corresponding specific data item to be filled.
- 2. The automatic recognition and reporting method for driver notes based on multimode neural network as set forth in claim 1, wherein: the specific content of the step S1 is as follows: s11, performing OCR text recognition on the original bill picture to form text information; Inputting the acquired original bill pictures into a universal OCR recognition tool, scanning text areas in the original bill pictures, and recognizing corresponding text information, wherein each original bill picture corresponds to unique text information, and the text information is in a character string form; s12, word segmentation processing is carried out on the text information to obtain a word vector matrix; splitting text information in a character string form corresponding to each original bill picture into a plurality of independent and ordered words through a word segmentation tool to form a word sequence corresponding to the original bill picture; converting each word in the word sequence into a group of word vectors in digital form, and combining all word vectors according to the word sequence to obtain a corresponding word vector matrix; S13, obtaining text characteristic representation of original pictures of each bill; Internal state initialization is carried out on an RNN network in the multi-mode neural network in advance, and an initial internal state value is set , Setting the total number of word vectors in each word vector matrix as n, wherein n is a positive integer; The multimode neural network comprises an RNN network, a CNN network and a feature fusion module; for each bill original picture, word vectors in the word vector matrix are sequentially and one by one input into an RNN (network node) network, and the RNN network outputs an internal state value of the RNN network at the current moment through the following formula: ; Wherein, the The method comprises the steps of (1) setting a weight matrix, wherein the weight matrix is a preset weight matrix, the weight matrix is an internal state value of a t-1 word vector, b is a preset offset term, and tanh is a hyperbolic tangent activation function; The t-th word vector in the word vector matrix; Splicing all internal state values obtained by each word vector matrix according to word vector sequence to obtain corresponding one-dimensional vector Will be A text feature representation as an original picture of the bill; S14, inputting the original bill picture into a CNN network to form a feature vector; preprocessing the original picture of the bill, wherein the preprocessing comprises uniform size and color standardization; inputting the preprocessed original bill picture into a CNN network, carrying out 2-dimensional convolution feature extraction on the CNN network to obtain a feature vector representing the visual features of the original bill picture, and marking the feature vector as ; S15, obtaining multi-mode characteristics of original pictures of the bill; The text characteristic representation of the original picture of the bill output in the step S13 is that Feature vector of visual feature of original picture of bill obtained in step S14 Inputting the images to a feature fusion module, wherein the feature fusion module carries out weighted fusion on the text feature representation weight coefficient and the weight coefficient of the feature vector of the visual feature based on the preset text feature representation weight coefficient to obtain the multi-mode feature of the original picture of the bill 。
- 3. The automatic recognition and reporting method for driver notes based on multimode neural network as set forth in claim 1, wherein: The specific content of the step S2 is as follows: S21, multi-mode feature classification and combination; Collecting various bills with known types in the freight scene, obtaining the multi-modal characteristics of the bills with known types based on the method of the step S1, classifying and combining the multi-modal characteristics to obtain a multi-modal characteristic set of each bill, and recording the multi-modal characteristic set as ; ; Wherein i is the bill category, M is the bill quantity under the category, The ith multi-modal feature of the bill category i; S22, calculating the correlation degree of the multi-mode features; Performing element-level comprehensive inner product operation on all multi-modal features in the multi-modal feature set of each bill, and taking the obtained operation result as an index Substituted into formula In the process, the feature relativity of the bills is obtained ; Feature relevance of each collected bill category Adding to obtain comprehensive multi-modal feature correlation degree of all types of bills ; ; Wherein, the The multi-mode characteristic element level comprehensive inner product operation result of the bill category i is obtained, and n is the total number of the bill categories; s23, training the RNN network; forming a training set by all bill categories and corresponding multi-mode feature sets obtained in the step S21; Will be As training target, iterative training is carried out on RNN network by training set, weight matrix W and offset term b are continuously adjusted in training process When the training round converges to the maximum value, the training of the RNN is completed, and the weight matrix W and the offset term b are the optimal parameters of the RNN.
- 4. The automatic recognition and reporting method for driver notes based on multimode neural network as set forth in claim 1, wherein: the specific content of the step S3 is as follows: For the bill picture newly uploaded by the driver, extracting the multi-modal feature T of the bill picture through the step S1, respectively carrying out element-level comprehensive inner product operation on the multi-modal feature T and M multi-modal features of each type of bill in the training set, and bringing the operation result into And obtaining a predicted POR value of the bill picture corresponding to each category, and selecting the bill category with the largest predicted POR value as the bill category to which the bill picture belongs.
- 5. The automatic recognition and reporting method for driver notes based on multimode neural network as set forth in claim 1, wherein: The specific content of the step S4 is as follows: s41, acquiring corresponding information of a text block and a text box; extracting text blocks and text boxes from the bill pictures for completing bill category prediction through an OCR recognition tool, wherein each text block is matched with a unique text box; Each TEXT block and the TEXT BOX corresponding to the TEXT block form a group of TEXT-BOX key value pairs, and all the TEXT-BOX key value pairs of each bill picture subjected to category prediction are integrated to obtain a TEXT-BOX key value pair list containing the bill picture; s42, searching a label text block; Setting specific data items to be extracted and filled into a system under each bill category according to the business requirement of bill filling in the freight industry in advance, and marking the specific data items as target values to be extracted; Acquiring a label keyword set corresponding to a target value to be extracted from a system according to the bill category of the bill picture, wherein the system comprises the label keyword set corresponding to each bill category, and the label keyword set comprises all possible expressions of the target label and is configured with matching priority for each expression; Traversing all TEXT blocks in the TEXT-BOX key value pair obtained in the step S41, filtering the symbols of each TEXT block, sequentially matching the TEXT block with various possible expressions of the target label based on the matching priority sequence in the label keyword set, stopping traversing and matching when the corresponding content of the TEXT block is completely matched with one expression in the label keyword set, determining the TEXT block as a label TEXT block corresponding to the current target value to be extracted, entering the step S43, judging that no effective target label exists in the bill picture if the TEXT block is not matched with the corresponding expression after traversing all TEXT blocks, and ending the extraction flow of the current target value; s43, acquiring coordinates of a tag text box; Based on the one-to-one correspondence between TEXT blocks and TEXT boxes in a TEXT BOX-BOX key value pair list, calling the TEXT BOX corresponding to the tag TEXT block to obtain four vertex coordinates of the TEXT BOX, and recording the TEXT BOX as a tag TEXT BOX S; S={A( , ),B( , ),C( , ),D( , )}; Wherein A, B, C, D is four vertexes of the upper left, upper right, lower right and lower left of the label text box S in sequence; S44, searching other nearby text boxes, and obtaining a target text box; and sequentially searching other nearby text boxes according to the sequence of the right side and the lower side, marking the text box corresponding to the feature value to be extracted as a target text box, and obtaining the target text box.
- 6. The automatic recognition and reporting method for driver bill based on multimode neural network as claimed in claim 5, wherein: the specific content of step S44 is as follows: s441, searching other text boxes on the right side of the tag text box S, and judging whether a target text box exists or not; marking other text boxes as T; T={E( , ),F( , ),G( , ),H( , )}; wherein E, F, G, H are four vertexes of upper left, upper right, lower right and lower left of other text boxes T in sequence; Sequentially matching the tag text box S with other text boxes on the right side, and judging whether the following two conditions are satisfied at the same time: (1) > And is also provided with > ; (2) > And is also provided with < ; Wherein the intersection point of the extension line of the line segment FE of the other text box T and the extension line of the line segment DB of the label text box S is M1 # , ) The intersection point of the extension line of the line segment GH of the other text box T and the extension line of the line segment BD of the label text box S is M2 # , ) The midpoint of the line segment BD is M # , ); When both the text boxes meet the above conditions, determining that the corresponding other text box is the target text box, and entering step S443; If the two conditions cannot be satisfied at the same time, the target text box does not exist in the other text boxes on the right side of the tag text box S, and the step S442 is entered; s442, searching other text boxes at the lower side of the label text box S, and judging whether a target text box exists or not; Sequentially matching the tag text box S with other text boxes on the lower side, and judging whether the following two conditions are met at the same time: (1) > And is also provided with > ; (2) < And is also provided with > ; Wherein the intersection point of the extension line of the line segment HE of the other text box T and the extension line of the line segment CD of the label text box S is The intersection point of the extension line of the line segment GF of the other text box T and the extension line of the line segment DC of the label text box S is The midpoint of the line segment DC is ; When both the text boxes meet the above conditions, determining that the corresponding other text box is the target text box, and entering step S443; If the two conditions cannot be met at the same time, the target text box does not exist in other text boxes below the label text box S; s443, acquiring a text block corresponding to the target text box; Based on a TEXT-BOX key value pair list of the bill picture, obtaining TEXT blocks corresponding to the determined target TEXT boxes one by one according to the determined target TEXT boxes, and obtaining TEXT block contents; S444, verifying text block content; The method comprises the steps of preparing corresponding content verification rules for each target value to be extracted according to business requirements of bill filling in the freight industry in advance, wherein the content verification rules comprise use requirements of numbers, letters and symbols; Matching and checking the text block content extracted in the step S443 with a content verification rule corresponding to a target value to be extracted; if the text block content accords with the content verification rule, judging that the text block content is an effective target report value, and finishing extraction; if the text block content does not accord with the content verification rule, judging that the text block content is an invalid target filling value, repeating the steps S441-S444 until the valid target filling value is obtained, completing extraction, filling the valid target filling value into a specific data item corresponding to the system, or after traversing all nearby text boxes, prompting that the extraction is failed.
Description
Automatic identification and filling method for driver bill based on multimode neural network Technical Field The invention relates to the technical field of computers, in particular to a method for automatically identifying and filling a driver bill based on a multimode neural network. Background In the freight logistics link of the transportation industry, a truck driver needs to rely on logistics management system software to finish the filling work of various expense and business data, and meanwhile, needs to upload corresponding bill photos as certificates; the types of bills to be filled by a driver are complex, which not only comprises high-speed bills, oiling bills, urea bills, repairing bills, loading and unloading bills, telephone bills, parking bills and other bills, but also comprises non-cost bills such as weighing bills, loading and unloading bills, shipping bills and other bills, and the dimension of information to be filled by various bills is various; In order to solve the problems of low bill filling efficiency and easy error, the main solution at the current stage in the industry is a picture OCR information extraction method, namely, character information in bill photos is recognized through OCR technology, and then recognition results are automatically filled into a table corresponding to a logistics management system; the technical logic of the method is suitable for identifying application scenes with uniform target patterns and stable content structures, such as information extraction of standardized certificates of bank cards, identity cards, nationwide uniform shopping invoices and the like by extracting text information at a designated position of a bill picture and matching and reporting the information as a tag value; however, the conventional picture OCR information extraction method has the following defects in the bill processing scene in the freight industry: 1. The method has poor suitability for non-standardized bill identification, wherein most bills in a freight scene have no unified format standard, the high-speed bills and parking bill formats of various provinces are obviously different, bills such as loading and unloading bills, repairing bills and the like are non-standard receipts issued by a freight yard, no fixed format exists, the bills such as weighing bills, loading and unloading bills, transporting bills and the like have completely different styles according to different freight yards, the arrangement modes of label numbers and label values are various, the arrangement modes comprise transverse arrangement, longitudinal arrangement and even mixed arrangement of the label numbers and the label numbers, if the bill photographs are inclined, the situation of inclined arrangement of information can also occur, and the traditional picture OCR information extraction method can not effectively identify the bills with disordered formats; 2. The existing picture OCR information extraction method only relies on text information to process the bill, and the problem of easy label value extraction errors cannot be accurately matched with information in the bill and system filling items; 3. The method has poor expandability, along with the development of the freight industry, the types of bills are continuously increased, formats are increasingly diversified, the existing picture OCR information extraction method based on fixed position extraction information cannot adapt to dynamic changes of bill forms, extraction rules are continuously adjusted aiming at new bill types, technical maintenance cost is increased, bill recognition error rate is continuously increased along with the increase of bill types, and the problem of automatic filling cannot be fundamentally solved. Disclosure of Invention The invention aims to solve the defects in the prior art, and provides a method for automatically identifying and filling a driver bill based on a multimode neural network. In order to achieve the above purpose, the invention adopts the following technical scheme: the automatic identification and filling method for the driver bill based on the multimode neural network comprises the following steps: S1, extracting multi-mode characteristics; The method comprises the following substeps: s11, performing OCR text recognition on the original bill picture to form text information; Inputting the acquired original bill pictures into a universal OCR recognition tool, and scanning text areas in the original bill pictures by the universal OCR recognition tool to recognize corresponding text information; s12, word segmentation processing is carried out on the text information to obtain a word vector matrix; splitting text information in a character string form corresponding to each original bill picture into a plurality of independent and ordered words according to a Chinese word structure by a word segmentation tool to form a word sequence corresponding to the original bill picture; Converti