CN-115937537-B - Intelligent identification method, device and equipment for target image and storage medium

CN115937537BCN 115937537 BCN115937537 BCN 115937537BCN-115937537-B

Abstract

The invention discloses an intelligent identification method, device and equipment for a target image and a storage medium. The method comprises the steps of inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature images with different scales, inputting the first feature images into a second sub-model to obtain at least three second feature images with different scales, inputting the second feature images into a third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence level, direction confidence level and deflection angle confidence level. The technical scheme of the embodiment of the invention balances the speed and the precision of image recognition, can rapidly and accurately determine the images of the targets to be recognized in various simple or complex scenes, and solves the problem of poor image recognition effect in complex scenes.

Inventors

YANG FEI
XIONG JIALE

Assignees

京北方信息技术股份有限公司

Dates

Publication Date: 20260512
Application Date: 20221208

Claims (9)

1. An intelligent identification method of a target image is characterized by comprising the following steps: Inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature graphs with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model; Inputting the first feature map into the second sub-model to obtain at least three second feature maps with different scales, wherein the second sub-model is constructed based on a feature pyramid network; Inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence, direction confidence and deflection angle confidence; inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the method comprises the following steps: Inputting the second feature images into the third sub-model, and performing ninth convolution processing on the current second feature images by using the third sub-model aiming at each second feature image to obtain target information of a detection frame of a target to be identified; screening the target information of the detection frame by using a non-maximum suppression algorithm to determine the target position, the target direction and the target deflection angle of the target detection frame; And cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be identified.
2. The method of claim 1, wherein the first sub-model comprises at least a convolution layer, a batch normalization layer, an activation function layer, and a pooling layer, wherein the inputting the initial image into the first sub-model of the preset feature extraction model results in at least three first feature maps of different scales, comprising: Inputting the initial image into a convolution layer, a batch standardization layer and an activation function layer to obtain a shallow layer characteristic diagram, a middle layer characteristic diagram and an initial deep layer characteristic diagram which are different in scale; After carrying out first convolution treatment on the initial deep feature map, carrying out pooling treatment on a first convolution treatment result to obtain pooling treatment results with a plurality of different scales, wherein the scale of the initial deep feature map is smaller than that of the shallow feature map and the middle layer feature map; and splicing the pooling processing results with different scales to obtain a deep feature map, wherein the first feature map comprises the deep feature map, the shallow feature map and the middle layer feature map.
3. The method of claim 2, wherein said inputting the first feature map into the second sub-model results in a second feature map of at least three different scales, comprising: Inputting the first feature map into the second sub-model, performing a second convolution treatment on the deep feature map by using the second sub-model, performing a first up-sampling treatment on a second convolution treatment result, and splicing the first up-sampling result and the middle layer feature map to obtain a first splicing feature; after the third convolution processing is carried out on the first splicing feature, carrying out second up-sampling processing on a third convolution processing result, and splicing the second up-sampling result and the shallow feature map to obtain a second splicing feature; After fourth convolution processing is carried out on the second splicing feature, fifth convolution processing is carried out on a fourth convolution processing result, and the fifth convolution processing result and the third convolution processing result are spliced to obtain a third splicing feature; And after sixth convolution processing is carried out on the third spliced feature, seventh convolution processing is carried out on a sixth convolution processing result, the seventh convolution processing result and the second convolution processing result are spliced to obtain a fourth spliced feature, and after eighth convolution processing is carried out on the fourth spliced feature, an eighth convolution processing result is obtained, wherein the second feature map comprises the fourth convolution processing result, the sixth convolution processing result and the eighth convolution processing result.
4. The method according to claim 1, wherein the determining manner of the preset feature extraction model includes: Acquiring a sample image containing a preset real frame, wherein the sample image at least contains one sample target image, the preset real frame is used for framing the sample target image, and the preset real frame is configured with a sample label; inputting the sample image into a first initial sub-model of a preset initial model to obtain a first sample feature map with at least three different scales, wherein the preset initial model at least comprises the first initial sub-model, a second initial sub-model and a third initial sub-model; Inputting the first sample feature map into the second initial sub-model to obtain at least three second sample feature maps with different scales, wherein the second initial sub-model is constructed based on a feature pyramid network; Inputting the second sample feature map into the third initial sub-model to obtain sample target information of a sample detection frame of a sample target to be identified, wherein the sample target information comprises sample position information, sample category confidence coefficient, sample direction confidence coefficient and sample deflection angle confidence coefficient; and determining a loss function according to the sample target information and the sample label, and training the preset initial model by using the loss function to obtain a preset feature extraction model.
5. The method of claim 4, further comprising, prior to said acquiring the sample image containing the preset true frame: Acquiring an initial sample image containing a preset initial real frame, wherein the initial sample image at least contains an initial sample target image, the preset initial real frame is used for framing the initial sample target image, the preset initial real frame is configured with an initial sample label, and the initial sample label comprises an initial sample direction label; Performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the processing process of the preset image processing to determine a sample label; The step of performing preset image processing on the initial sample image to obtain a sample image, and updating an initial sample label according to the preset image processing process to determine the sample label includes: If the image rotation processing is determined to be performed on the initial sample image, determining a preset angle before the rotation image processing is performed, wherein the preset angle is the angle closest to the rotation angle corresponding to the image rotation processing in a preset angle set; Rotating the initial sample image to a first position corresponding to the preset angle, and determining a first sequence of first angle coordinates according to a preset ordering mode, wherein the first angle coordinates are coordinates of angular points of a preset initial real frame after the preset angle is rotated, and the preset ordering mode comprises ordering of absolute values of the coordinates of preset coordinate axes; And after the initial sample image is restored to the initial position before rotation, the initial sample image is rotated to a second position corresponding to the rotation angle, the initial sample image at the second position is determined to be a sample image, a second sequence of second corner coordinates is determined according to the preset ordering mode, and if the second sequence is the same as the first sequence, the sample direction corresponding to the preset angle is determined to be a sample direction label in a sample label of the sample image.
6. The method of claim 4, wherein the loss function comprises a regression loss function, a confidence loss function, a category loss function, an angle loss function, and a direction loss function, the regression loss function being determined based on an intersection ratio of the sample detection box and the preset real box, and an area of a minimum convex closure box of the sample detection box and the preset real box.
7. An intelligent recognition device for a target image, comprising: The first feature map determining module is used for inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, the second sub-model and a third sub-model; The second feature map determining module is used for inputting the first feature map into the second sub-model to obtain at least three second feature maps with different scales, wherein the second sub-model is constructed based on a feature pyramid network; the target image determining module is used for inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence coefficient, direction confidence coefficient and deflection angle confidence coefficient; The target image determining module includes: the target information determining unit is used for inputting the second feature images into the third sub-model, and performing ninth convolution processing on the current second feature images by using the third sub-model aiming at each second feature image to obtain target information of a detection frame of a target to be identified; The screening unit is used for screening the target information of the detection frame by utilizing a non-maximum suppression algorithm so as to determine the target position, the target direction and the target deflection angle of the target detection frame; And the target image determining unit is used for cutting out an image corresponding to the target detection frame from the initial image according to the target position, the target direction and the target deflection angle so as to obtain a target image of the target to be identified.
8. An electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the intelligent recognition method of a target image according to any one of claims 1-6.
9. A computer readable storage medium storing computer instructions for causing a processor to perform the intelligent recognition method of a target image according to any one of claims 1-6.

Description

Intelligent identification method, device and equipment for target image and storage medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to an intelligent identification method, apparatus, device, and storage medium for a target image. Background With the rapid development of new generation information technology represented by artificial intelligence, big data and cloud computing, image identification is taken as a ring which is the most basic in the process of enterprise digital transformation. At present, a detection algorithm based on deep learning is generally utilized to extract target features of target images, so that difference features among different target images can be obtained, and the method has good anti-interference capability. Currently, the mainstream detection algorithms include YOLO (You Only Look Once, only one time) series detection algorithm and RCNN (Region Convolutional Neural Networks, regional convolutional neural network) detection algorithm, and the two detection algorithms can be applied to different detection scenes. However, the conventional target image detection method has a certain limitation, and cannot achieve a good effect on complex scenes such as image wrinkles, image darkness and image deformation. Disclosure of Invention The invention provides an intelligent identification method, device and equipment of a target image and a storage medium, which are used for solving the problem of poor identification effect of the target image in a complex scene. In a first aspect, the present invention provides an intelligent recognition method for a target image, including: Inputting an initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature graphs with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model; Inputting the first feature map into the second sub-model to obtain at least three second feature maps with different scales, wherein the second sub-model is constructed based on a feature pyramid network; And inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence, direction confidence and deflection angle confidence. In a second aspect, the present invention provides an intelligent recognition apparatus for a target image, including: The first feature map determining module is used for inputting the initial image into a first sub-model of a preset feature extraction model to obtain at least three first feature maps with different scales, wherein the preset feature extraction model at least comprises the first sub-model, the second sub-model and a third sub-model; The second feature map determining module is used for inputting the first feature map into the second sub-model to obtain at least three second feature maps with different scales, wherein the second sub-model is constructed based on a feature pyramid network; The target image determining module is used for inputting the second feature map into the third sub-model to obtain target information of a target to be identified, and determining a target image of the target to be identified from the initial image according to the target information, wherein the target information comprises position information, category confidence coefficient, direction confidence coefficient and deflection angle confidence coefficient. In a third aspect, the present invention provides an electronic device comprising: At least one processor; And a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the intelligent recognition method of the target image of the first aspect described above. In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions for causing a processor to execute the method for intelligently identifying a target image according to the first aspect. According to the intelligent recognition scheme of the target image, an initial image is input into a first sub-model of a preset feature extraction model to obtain at least three first feature images with different scales, wherein the preset feature extraction model at least comprises the first sub-model, a second sub-model and a third sub-model, the first feature images are input into the second sub-model to obtain at least three second feature images with different scales, the second sub-model is constructed based on a feature pyramid network, the second feature images are input into the third sub-model to obt