CN-121982330-A - Image recognition strengthening method, device, equipment and storage medium thereof

CN121982330ACN 121982330 ACN121982330 ACN 121982330ACN-121982330-A

Abstract

The application belongs to the technical field of image recognition, and relates to an image recognition strengthening method, an image recognition strengthening device, image recognition strengthening equipment and a storage medium thereof. The image recognition promotion model after reinforcement learning is adopted to recognize the image to be recognized, so that the accuracy and precision of image recognition are improved. The image recognition strengthening method is applied to a financial settlement business scene, can more accurately recognize settlement basis according to settlement image materials, can more accurately recognize organization structure information in medical examination images and assist medical staff to recognize medical images.

Inventors

WANG JIANZONG
ZHANG XULONG
Bao Xikun

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. An image recognition strengthening method is characterized by comprising the following steps: acquiring an image to be identified and a target image set, wherein the target image set comprises an image set for performing reinforcement learning training on a preset target basic model; Inputting the target image set into a preset target basic model, and obtaining an image primary identification result; Performing reinforcement learning on the target basic model by utilizing the image primary identification result to obtain an image identification promotion model; Inputting the image to be identified into an image identification promotion model after reinforcement learning; And identifying the image to be identified by adopting the image identification promotion model to obtain an image identification target result.
2. The method for enhancing image recognition according to claim 1, wherein the step of performing reinforcement learning on the target base model by using the image preliminary recognition result to obtain an image recognition promotion model specifically comprises: acquiring multiple rounds of identification answering data output by the target basic model aiming at the target image set; classifying the images in the target image set according to the multi-round identification answering data to obtain a first target category image set and a second target category image set; Inputting the first target type image set into the target basic model, and performing perception reinforcement learning on the target basic model based on a preset perception reinforcement strategy to obtain a preliminary promotion model after the perception reinforcement learning; Inputting the second target type image set into the primary promotion model, and carrying out reasoning reinforcement learning on the primary promotion model based on a preset reasoning reinforcement strategy to obtain a secondary promotion model after reasoning reinforcement learning; setting the secondary promotion model as the image recognition promotion model after reinforcement learning.
3. The method for enhancing image recognition according to claim 2, wherein the step of classifying the images in the target image set according to the multiple rounds of recognition solution data to obtain a first target class image set and a second target class image set specifically comprises: Screening images with correct identification of all rounds from the multiple rounds of identification answer data according to preset identification answers, and carrying out first marking; Adding all the first marked images into a first image set to generate a first target class image set; screening partial images with correct round identification from the multiple rounds of identification answer data according to the preset identification answers, and carrying out second marking; and adding all the second marked images into a second image set to generate the second target class image set.
4. The method of image recognition reinforcement according to claim 2, wherein the step of performing perception reinforcement learning on the target base model based on a preset perception reinforcement policy to obtain a preliminary promotion model after perception reinforcement learning specifically comprises: acquiring image description texts which are respectively output by the target basic model for all images in the first target class image set and serve as first image description texts; Calculating the image-text alignment degree of the first image description text and all images in the first target class image set by adopting a similarity calculation mode according to a preset image-text alignment standard answer; Acquiring image description texts which are respectively output by a preset teacher model for all images in the first target class image set and serve as second image description texts; Extracting image description keywords from the first image text and the second image description text respectively; Calculating the matching degree of the image description keywords according to the image description keywords respectively corresponding to the first image text and the second image description text; identifying the output format adaptation degree of the first image description text according to a preset image description text output format; calculating the output length of the first image description text, and determining the output length coincidence degree of the first image description text according to the output length calculation result and a preset output length rule; carrying out iterative optimization training on the target basic model by integrating the image-text alignment degree, the image description keyword matching degree, the output format adaptation degree and the output length coincidence degree; and setting the target basic model as a preliminary promotion model after perception reinforcement learning until the first image description text output by the target basic model meets the preset optimization requirement.
5. The method for enhancing image recognition according to claim 4, wherein the step of calculating the image-text alignment degree of the first image description text and all the images in the first target class image set by adopting a similarity calculation method according to a preset image-text alignment standard answer specifically comprises the following steps: Calculating the image-text alignment degree of the first image description text and all images in the first target class image set by adopting an image-text alignment similarity calculation mode based on FGCLIP, specifically, the image-text alignment similarity calculation mode based on FGCLIP comprises the following steps: extracting image-text semantic relations from all images in the first target class image set to serve as image embedding features; extracting edge features, color histograms and SIFT features from all images in the first target class image set; Fusing the extracted edge features, color histograms and SIFT features with the image embedding features to obtain feature fused image embedding representations; encoding the first image description text by using a text encoder in FGCLIP to obtain a text embedded representation; And calculating the similarity of the text embedded representation and the image embedded representation after feature fusion by adopting a cosine similarity mode, and setting the similarity as the image-text alignment degree.
6. The method for enhancing image recognition according to claim 4, wherein the step of calculating the output length of the first image description text and determining the output length compliance of the first image description text according to the output length calculation result and a preset output length specification specifically comprises: Calculating the output length of a first image description text corresponding to the current image; If the output length accords with the preset output length specification, a first image description text corresponding to the current image accords with the preset output length specification; If the output length does not meet the preset output length specification, the first image description text corresponding to the current image does not meet the preset output length specification; And counting the number of images conforming to the preset output length, calculating the ratio of the number of images in the first target class image set, and taking the ratio as the output length conforming degree of the first image description text.
7. The method of claim 2, wherein the step of performing inference reinforcement learning on the primary promotion model based on a preset inference reinforcement policy to obtain a secondary promotion model after the inference reinforcement learning specifically comprises: Acquiring identification solutions with chain reasoning, which are respectively output by the preliminary promotion model for all images in the second target class image set, as actual reasoning solutions; According to a preset teacher model, respectively outputting identification solutions with chain reasoning for all images in the second target class image set as reference reasoning solutions; calculating the consistency of the actual reasoning solution and the reference reasoning solution through comparison; identifying the standardability of the actual reasoning solution according to a preset chain reasoning solution format specification; Iterative updating optimization is carried out on the preliminary promotion model by integrating the consistency and the normalization; And setting the primary promotion model as a secondary promotion model after reasoning reinforcement learning until the actual reasoning solution output by the primary promotion model meets the preset consistency and normalization standard.
8. An image recognition reinforcing apparatus, comprising: The image acquisition module is used for acquiring an image to be identified and a target image set, wherein the target image set comprises an image set used for performing reinforcement learning training on a preset target basic model; The image primary identification module is used for inputting the target image set into a preset target basic model and obtaining an image primary identification result; The model reinforcement learning module is used for performing reinforcement learning on the target basic model by utilizing the image primary identification result to obtain an image identification promotion model; the image to be identified input module is used for inputting the image to be identified into the image identification promotion model after reinforcement learning; And the image promotion identification module is used for identifying the image to be identified by adopting the image identification promotion model to obtain an image identification target result.
9. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the image recognition enhancement method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the image recognition enhancement method according to any of claims 1 to 7.

Description

Image recognition strengthening method, device, equipment and storage medium thereof Technical Field The application relates to the technical field of image recognition, and relates to an image recognition strengthening method, device and equipment and a storage medium thereof, which are applied to scenes for more accurately recognizing contents in images. Background With the rapid development of image recognition technology in the field of artificial intelligence, the application of the image recognition technology in the examination of insurance claim materials and medical service inspection is increasingly wide. The image recognition technology in the traditional technology mainly relies on deep learning and Convolutional Neural Network (CNN), and can realize tasks such as object recognition, face recognition, image segmentation and the like by training a model through a large amount of annotation data. The model can improve adaptability and accuracy under different environments through data enhancement, transfer learning and multi-scale analysis, and mainstream applications comprise automatic driving, security monitoring, medical image analysis and the like. In recent years, visual-language models have demonstrated more powerful capabilities than conventional image recognition techniques in multimodal tasks such as visual questions and answers, graph-text reasoning, and graph-text recognition. However, the existing method for image recognition by adopting a vision-language model generally has insufficient model perception and interference of reasoning and perception coupling, so that the image recognition is inaccurate. Disclosure of Invention The embodiment of the application aims to provide an image recognition strengthening method, device, equipment and a storage medium thereof, which are used for solving the problems of insufficient perception of an image recognition model and interference of reasoning and perception coupling in the prior art, thereby improving the image recognition precision. In a first aspect, an embodiment of the present application provides an image recognition enhancement method, which adopts the following technical scheme: an image recognition strengthening method comprises the following steps: acquiring an image to be identified and a target image set, wherein the target image set comprises an image set for performing reinforcement learning training on a preset target basic model; Inputting the target image set into a preset target basic model, and obtaining an image primary identification result; Performing reinforcement learning on the target basic model by utilizing the image primary identification result to obtain an image identification promotion model; Inputting the image to be identified into an image identification promotion model after reinforcement learning; And identifying the image to be identified by adopting the image identification promotion model to obtain an image identification target result. In a second aspect, an embodiment of the present application further provides an image recognition enhancement apparatus, which adopts the following technical scheme: An image recognition enhancement device, comprising: The image acquisition module is used for acquiring an image to be identified and a target image set, wherein the target image set comprises an image set used for performing reinforcement learning training on a preset target basic model; The image primary identification module is used for inputting the target image set into a preset target basic model and obtaining an image primary identification result; The model reinforcement learning module is used for performing reinforcement learning on the target basic model by utilizing the image primary identification result to obtain an image identification promotion model; the image to be identified input module is used for inputting the image to be identified into the image identification promotion model after reinforcement learning; And the image promotion identification module is used for identifying the image to be identified by adopting the image identification promotion model to obtain an image identification target result. In a third aspect, an embodiment of the present application further provides a computer device, which adopts the following technical scheme: a computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the image recognition enhancement method described above. In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical solutions: A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the image recognition enhancement method as described above. Compared with the prior art, the embodiment of the application has the following