CN-115205865-B - Method and device for identifying image and electronic equipment
Abstract
The invention discloses a method and device for identifying images and electronic equipment. The method comprises the steps of obtaining an image to be recognized, inputting the image to be recognized into a pre-trained target image recognition model to obtain a multichannel output image, determining the position of a preset number in the character string based on the confidence prediction map, and combining the preset numbers into a recognition result of the character string based on the position of the preset number in the character string, wherein the preset area of the image to be recognized comprises a character string composed of numbers. Digits in the image can be identified without preprocessing such as table line removal and character segmentation of the image to be identified, so that identification errors caused by abnormal conditions in the preprocessing process can be avoided, and the accuracy and the robustness of identification are improved.
Inventors
- ZHANG LIANG
- WANG YUFANG
- WANG ZHIMING
- LU XIAOFAN
- CHANG HONGYUAN
Assignees
- 华云(河北雄安)大数据科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20220725
- Priority Date
- 20220509
Claims (8)
- 1. A method for identifying an image, the method comprising: acquiring an image to be identified, wherein a predetermined area of the image to be identified comprises a character string composed of numbers; Inputting the image to be identified into a pre-trained target image identification model to obtain a multi-channel output image, wherein each channel in the output image corresponds to a confidence prediction graph of a preset number, 1 channel in the output image corresponds to a confidence prediction graph of numbers with correction marks, the target image identification model is trained to obtain a sample set through the steps of obtaining a sample image set, wherein the sample image in the sample set comprises at least one marked virtual sample image and at least one marked real sample image, a sample character string is included in a sample area in the sample image, the virtual sample image is an image generated based on the real sample image, the real sample image is an image obtained by shooting sample text, the virtual sample image is generated by dividing a single digital image from the real sample image and marking numbers in the single digital image to obtain a plurality of sample digital images, generating a sample correction digital image based on the sample digital image, generating an initial image, determining a sample area in the initial image, wherein pixels in the initial image comprise at least one marked virtual sample image and at least one marked real sample image, the sample character string is a random number of pixels in the initial image, the sample area is a random number of pixels in the sample area is superimposed on each sample area, the random number of the sample area is generated in the sample area, the random number of the sample area is marked in the sample area, and the sample area is marked in the random number of the sample area, obtaining the virtual sample image; the method comprises the steps of obtaining a confidence predictive diagram corresponding to a channel, carrying out smoothing processing and binarization processing on the confidence predictive diagram to obtain a processed predictive diagram, carrying out connected region analysis on a preset region in the processed predictive diagram to determine a connected region in the preset region in the processed predictive diagram, and determining the position of the preset number corresponding to the channel in the character string based on the position of the connected region; And combining the preset numbers into a recognition result of the character string based on the positions of the preset numbers in the character string.
- 2. The method of claim 1, wherein the output image comprises 11 channels, wherein 10 channels correspond to confidence prediction graphs of numbers 0 through 9, respectively.
- 3. Method according to one of claims 1 to 2, characterized in that the image to be recognized is an image obtained by photographing a text to be recognized, the character string characterizing a test number in the text to be recognized.
- 4. A method according to claim 3, wherein the target image recognition model is further trained via the steps of: training a pre-built initial image recognition model based on the virtual sample image to obtain a pre-trained image recognition model; and training the pre-trained image recognition model again based on the real sample image to obtain the target image recognition model.
- 5. The method of claim 1, wherein the sample altering digital image is generated by: generating an initial sample image, wherein the pixel value of a pixel point in the initial sample image is 0; randomly generating one or more straight lines in the initial sample image; randomly selecting one sample digital image to be overlapped into the initial sample image to obtain the sample altering digital image.
- 6. The method of claim 5, wherein the table is generated by: randomly determining the height, length and number of cells of a table to be generated; Determining a straight line to be drawn based on the height, the length and the number of cells of the table to be generated; dividing each straight line to be drawn into a preset number of line segments in average, and generating a random number for each line segment; And if the random number corresponding to the line segment is larger than a preset threshold value, drawing the line segment to obtain the table, wherein the line width of the line segment is the sum of the preset stroke width and the random disturbance.
- 7. An apparatus for recognizing an image, the apparatus comprising: An image acquisition unit configured to acquire an image to be recognized, the predetermined area of the image to be recognized including a character string composed of numerals; An image prediction unit configured to input the image to be recognized into a pre-trained target image recognition model to obtain a multi-channel output image, wherein each channel in the output image corresponds to a confidence prediction graph of a preset number, 1 channel in the output image corresponds to a confidence prediction graph of a number with a correction trace, the target image recognition model is trained by acquiring a sample set, the sample image in the sample set comprises at least one marked virtual sample image and at least one marked real sample image, a sample character string is included in a sample area in the sample image, the virtual sample image is an image generated based on the real sample image, the real sample image is an image obtained by shooting sample text, the virtual sample image is generated by dividing a single digital image from the real sample image and marking numbers in the single digital image to obtain a plurality of sample digital images, generating a correction digital image based on the sample digital image, generating an initial image, determining a random sample character string in the initial image, the random character string is superimposed on the sample character string in the sample area, obtaining the virtual sample image; The position determining unit is configured to determine the position of the preset number in the character string based on the confidence prediction graph, and specifically comprises the steps of extracting the confidence prediction graph corresponding to the channel from the output image, carrying out smoothing processing and binarization processing on the confidence prediction graph to obtain a processed prediction graph, carrying out connected region analysis on a preset region in the processed prediction graph to determine the connected region in the preset region in the processed prediction graph, and determining the position of the preset number corresponding to the channel in the character string based on the position of the connected region; and a result determining unit configured to combine the preset numbers into a recognition result of the character string based on the positions of the preset numbers in the character string.
- 8. An electronic device, the electronic device comprising: One or more processors; a storage device having one or more programs stored thereon, The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for identifying images of any of claims 1-6.
Description
Method and device for identifying image and electronic equipment Technical Field The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying an image, and an electronic device. Background In practice, the examination number of the student is recorded in the answer sheet of the daily work, examination paper or intelligent composition of the student, and the examination number in the answer sheet is automatically identified by adopting an intelligent examination paper system. In the related art, the method for identifying the examination number mainly comprises the following two steps of preprocessing, namely dividing numbers forming the examination number from images of daily operations or examination papers, for example, adopting straight line detection methods such as Hough transformation, single necklace, projection and the like to remove table lines around the numbers, then obtaining an image area where the numbers are located according to the methods such as histogram projection or connected area analysis and the like, dividing each number from the images, and identifying the single numbers obtained by the division. Because the rules involved in the preprocessing method are complex, abnormal conditions are easy to occur in the preprocessing process, and recognition errors are further caused. Disclosure of Invention In view of the above, the present invention aims to provide a method, an apparatus and an electronic device capable of improving digital recognition accuracy and robustness in an image. In order to achieve the above purpose, the invention adopts the following technical scheme: the invention provides a method for recognizing an image, which comprises the steps of obtaining an image to be recognized, inputting the image to be recognized into a pre-trained target image recognition model to obtain a multi-channel output image, determining the position of a preset number in a character string based on the confidence prediction graph, and combining the preset numbers into a recognition result of the character string based on the position of the preset number in the character string. In some embodiments, determining the position of the preset number in the character string based on the confidence prediction map comprises extracting the confidence prediction map corresponding to the channel from the output image, performing smoothing and binarization processing on the confidence prediction map to obtain a processed prediction map, performing connected region analysis on a preset region in the processed prediction map, determining the connected region in the preset region in the processed prediction map, and determining the position of the preset number corresponding to the channel in the character string based on the position of the connected region. In some embodiments, the output image includes 11 channels, where 10 channels correspond to confidence prediction maps of numbers 0 through 9, respectively, and 1 channel corresponds to a confidence prediction map of numbers where a correction trace exists. In some embodiments, the image to be identified is an image obtained by photographing the text to be identified, and the character string characterizes the test number in the text to be identified. In some embodiments, the target image recognition model is trained by acquiring a sample set, wherein a sample image in the sample set comprises at least one marked virtual sample image and at least one marked real sample image, a sample character string is included in a sample area in the sample image, the virtual sample image is an image generated based on the real sample image, the real sample image is an image obtained by shooting sample text, training a pre-built initial image recognition model based on the virtual sample image to obtain a pre-trained image recognition model, and training the pre-trained image recognition model again based on the real sample image to obtain the target image recognition model. In some embodiments, the virtual sample image is generated by segmenting a single digital image from a real sample image and marking numbers in the single digital image to obtain a plurality of sample digital images, generating a sample altering digital image based on the sample digital image, generating an initial image, determining a sample area in the initial image, wherein the pixel value of a pixel point in the initial image is 0, generating a single-row table in the sample area, wherein the table comprises a random number of cells, randomly selecting a sample digital image for each cell and overlapping the sample digital image in the area of the cell, randomly overlapping the sample altering digital image in the area of each cell to generate a sample character string, and marking the sample character string based on the marking result of the sample digital image overlapped in the area of each cell to obtain the virtual