CN-122024258-A - Picture digital identification method based on residual convolution network

CN122024258ACN 122024258 ACN122024258 ACN 122024258ACN-122024258-A

Abstract

The invention discloses a picture digital identification method based on a residual convolution network, and relates to the technical field of digital identification. The method comprises the steps of obtaining a picture to be identified, preprocessing the picture to be identified to obtain a plurality of preprocessed segmented blocks and corresponding position codes, inputting the preprocessed segmented blocks into a trained picture digital identification model to identify the picture to be identified to obtain digital information corresponding to the segmented blocks, wherein the picture digital identification model identifies the picture to be identified based on a residual convolution network, the method comprises an input layer module, a convolution calculation module, a residual calculation module, a flattening module, a full connection module and an output layer module, and based on the number corresponding to the identified segmented blocks and the position codes, the number string contained in the picture to be identified is obtained through combination, and the number comprises a printing body number and a handwriting number. And the efficiency and the accuracy of digital identification in the picture are improved.

Inventors

LI SHUAI
ZHANG JINDONG
WANG LIYANG

Assignees

北京华航无线电测量研究所

Dates

Publication Date: 20260512
Application Date: 20241125
Priority Date: 20241108

Claims (10)

1. The picture digital identification method based on the residual convolution network is characterized by comprising the following steps of: Preprocessing the obtained picture to be identified to obtain a plurality of preprocessed segmented blocks and corresponding position codes; Inputting the preprocessed segmented blocks into a trained picture digital recognition model to recognize, and obtaining digital information corresponding to the segmented blocks, wherein the picture digital recognition model recognizes the picture to be recognized based on a residual convolution network and comprises an input layer module, a convolution calculation module, a residual calculation module, a flattening module, a full connection module and an output layer module; Combining the numbers corresponding to the identified dividing blocks and the position codes to obtain a number string contained in the picture to be identified; wherein the numbers include printed numbers and handwritten numbers.
2. The method of claim 1, wherein inputting the preprocessed segmented blocks into a trained picture digital recognition model, comprises: Inputting the preprocessed segmented blocks into the convolution calculation module through an input layer module, and sequentially extracting convolution extraction features of pictures through a plurality of convolution layers, wherein Swish activation function layers and pooling layers are arranged behind each convolution layer; Inputting the convolution extraction characteristics into the residual calculation module, and calculating residual extraction characteristics of an extracted picture through a plurality of residual blocks; Inputting the residual extraction features into the flattening module, and flattening the multidimensional feature map of the residual extraction features into one-dimensional feature vectors; classifying the one-dimensional feature vectors through the full connection module; The output layer module converts the classified one-dimensional feature vectors into probability distributions for each class using a Softmax function.
3. The method of claim 1, wherein the convolution calculation module comprises a first, a second and a third convolution layers in series, each of the first, the second and the third convolution layers being followed by Swish activation functions and pooling layers; The first convolution layer receives the preprocessed segmented blocks, extracts primary features of pictures by using 32 convolution kernels, and obtains first convolution features after Swish activation and pooling of the primary features; The number of convolution kernels of the second convolution layer is increased to 64, more complex local features of the first convolution feature are further extracted, and Swish activation and pooling are carried out on the local features to obtain second convolution features; the number of convolution kernels of the third convolution layer is further increased to 128, deep features of the second convolution features are extracted, swish activation and pooling are carried out on the deep features to obtain third convolution features, and the third convolution features are used as convolution extraction features output by the convolution calculation module; Wherein: The Swish activation function dynamically adjusts the activation strength based on the input features; the pooling layer is used to reduce the spatial dimension of the input features.
4. The method of claim 1, wherein the residual calculation module comprises first, second, and third residual blocks in parallel; Each residual block sequentially comprises a first convolution layer, a ReLU activation layer, a second convolution layer, a BN batch normalization layer, a summation layer and a ReLU activation layer, wherein the input of each residual block is connected with the summation layer in a jumping manner; Each convolution layer of the first residual block comprises 128 convolution kernels with a size of 3×3, and the convolution kernels are used for extracting preliminary complex features of the convolution extracted features as first residual features; each convolution layer of the second residual block includes 256 convolution kernels of size 5×5 for extracting deeper features of the first residual feature as second residual features; Each convolution layer of the third residual block includes 512 convolution kernels of 7×7 size for extracting high-level features of the second residual feature as third residual features; splicing the first residual error feature, the second residual error feature and the third residual error feature to obtain a spliced feature; And matching the convolution extraction feature with the dimension of the splicing feature by using 1 multiplied by 1 convolution adjustment channel number, and then performing jump connection with the splicing feature to obtain the residual extraction feature of the picture.
5. The method of claim 4, wherein x is input to the deep residual network, stitching the first, second and third residual characteristics of the outputs of the first, second and third residual blocks, expressed as follows; Wherein y 1 、y 2 and y 3 are the first, second and third residual characteristics respectively, the splice results of the three are y, and F 1 (x)、F 2 (x) and F 3 (x) are the processing results of the first, second and third residual blocks normalized by the first convolution layer, the ReLU activation layer, the second convolution layer and the BN batch respectively.
6. The method according to claim 2, wherein the picture digital recognition model is trained by: Constructing a training data set of the picture digital identification model, wherein the training data set comprises sample pictures and corresponding sample labels; Training the picture digital recognition model based on the training data set; And performing gradient fusion by using the cross entropy loss and the mean square error loss as a loss function to train a picture digital recognition model, and continuously adjusting parameters by using a counter-propagation and gradient descent optimization algorithm to minimize the loss function until the training is finished after the loss function is converged, and storing the parameters of the picture character recognition model.
7. The method of claim 6, wherein the cross entropy loss L CE and the mean square error loss L MSE are gradient fused as a loss function as follows: Wherein L CE 、▽L MSE is the gradient of the cross entropy loss and the mean square error loss relative to the picture digital recognition model parameter, and L CE ||、||▽L MSE is the norm of the gradient of the cross entropy loss and the mean square error loss; The cross entropy loss is: The mean square error loss is: Wherein N is the total number of samples, C is the total number of categories, y i is the true value of the ith sample, Y ic is an indicator variable, if sample i belongs to class c, y ic =1, otherwise y ic ＝0;p ic is a probability that model prediction sample i belongs to class c.
8. The method according to claim 1, wherein preprocessing the picture to be identified comprises: converting the RGB color picture of the picture to be identified into a gray picture; Performing binarization processing on the gray level picture to obtain a binarized picture; Carrying out connected region analysis in the denoised binarized picture, and identifying and marking all connected white regions; Screening out candidate areas conforming to digital characteristics based on the area and the shape of the connected white areas and comparing with a predefined digital characteristic threshold; Taking each screened candidate digital region as a segmentation block to obtain a plurality of segmentation blocks containing numbers, wherein the central position coordinates of the segmentation blocks are used as the position codes of the segmentation blocks; and unifying the sizes of the divided blocks to be the predetermined picture size.
9. The method of claim 2, wherein the Swish activation function is expressed as: Wherein x is a first, second or third convolution feature, sigma is an improved Sigmoid function, beta is a learnable parameter, alpha is a nonlinear scaling factor, alpha is a learnable parameter, and the initial value is in the range of [0,0.1] and is updated through a back propagation algorithm and gradient descent; the Swish activation function dynamically adjusts the activation strength based on the input features as follows: When the parameter σ (βx) is close to 1, the output is close to the input feature x; when the parameter σ (βx) is close to 0, the output is close to the input feature 0; The value of the parameter σ (βx) is dynamically adjusted according to the input x.
10. The method of claim 8, wherein the predetermined picture size is 30 x 30 pixels.

Description

Picture digital identification method based on residual convolution network Technical Field The invention belongs to the technical field of digital identification, and particularly relates to a picture digital identification method based on a residual convolution network. Background With the development of scientific technology, digital identification technology has become an indispensable part of modern society, and is widely applied to a plurality of business fields including, but not limited to, license plate identification, population information input, financial industry and the like. Under the background of global informatization comprehensive development and automation degree improvement, the demands for digital identification technology are increasing, and particularly in the business fields of financial statement processing, automatic postal sorting, test paper score statistics, bank bill processing, financial digital statistics and the like, the demands for handwritten picture digital identification technology are becoming urgent. However, existing digital recognition systems have significant shortcomings in recognition speed and contrast efficiency, which limit their new performance in practical applications. For example, in processing large amounts of data, the processing speed of conventional identification systems cannot meet real-time or near real-time requirements, resulting in delays and inefficiency in business processes. In addition, for recognition of handwriting numbers, due to the variety and complexity of the shapes, the conventional technology is difficult to realize high-precision recognition, which directly affects the reliability and practicality of recognition results. Disclosure of Invention In view of the above analysis, the embodiment of the invention aims to provide a picture number identification method based on a residual convolution network, which is used for solving the technical problems of low identification efficiency and low accuracy of numbers in pictures in the prior art. In order to solve the technical problems, the main technical scheme adopted by the invention comprises the following steps: A picture digital identification method based on a residual convolution network comprises the following steps: Preprocessing the obtained picture to be identified to obtain a plurality of preprocessed segmented blocks and corresponding position codes; Inputting the preprocessed segmented blocks into a trained picture digital recognition model to recognize, and obtaining digital information corresponding to the segmented blocks, wherein the picture digital recognition model recognizes the picture to be recognized based on a residual convolution network and comprises an input layer module, a convolution calculation module, a residual calculation module, a flattening module, a full connection module and an output layer module; Combining the numbers corresponding to the identified dividing blocks and the position codes to obtain a number string contained in the picture to be identified; wherein the numbers include printed numbers and handwritten numbers. Further, inputting the preprocessed segmented block into a trained picture digital recognition model, including: Inputting the preprocessed segmented blocks into the convolution calculation module through an input layer module, and sequentially extracting convolution extraction features of pictures through a plurality of convolution layers, wherein Swish activation function layers and pooling layers are arranged behind each convolution layer; Inputting the convolution extraction characteristics into the residual calculation module, and calculating residual extraction characteristics of an extracted picture through a plurality of residual blocks; Inputting the residual extraction features into the flattening module, and flattening the multidimensional feature map of the residual extraction features into one-dimensional feature vectors; classifying the one-dimensional feature vectors through the full connection module; The output layer module converts the classified one-dimensional feature vectors into probability distributions for each class using a Softmax function. Further, the convolution calculation module comprises a first convolution layer, a second convolution layer and a third convolution layer which are sequentially connected in series, wherein Swish activation functions and pooling layers are arranged behind the first convolution layer, the second convolution layer and the third convolution layer; The first convolution layer receives the preprocessed segmented blocks, extracts primary features of pictures by using 32 convolution kernels, and obtains first convolution features after Swish activation and pooling of the primary features; The number of convolution kernels of the second convolution layer is increased to 64, more complex local features of the first convolution feature are further extracted, and Swish activation