CN-116704238-B - Image detection method and device, electronic equipment and storage medium

CN116704238BCN 116704238 BCN116704238 BCN 116704238BCN-116704238-B

Abstract

The application provides an image detection method, an image detection device, electronic equipment and a storage medium, which comprise the steps of collecting an image to be processed, carrying out feature extraction on the image to be processed by utilizing a pre-training image encoder to obtain target image features, inputting training set text features and target image features into a pre-training comparison network to carry out feature matching calculation to obtain target image query result values, carrying out similarity calculation on the target image features and the training set image features to generate a target similarity matrix, carrying out nonlinear mapping calculation on a label and the target similarity matrix based on a first target super parameter to generate a target similarity query result value, carrying out weighting calculation on the target image query result value and the target similarity query result value based on a second target super parameter to obtain target classification result values, and wherein the target classification result values are used for indicating detection results of the image to be processed. Thereby being more convenient and quick to apply in various different application environments.

Inventors

SHEN YUSHI
Qiao Xuehong
LI SHIJIE

Assignees

飞诺门阵(北京)科技有限公司

Dates

Publication Date: 20260508
Application Date: 20230516

Claims (9)

1. An image detection method, comprising: Collecting an image to be processed; Extracting features of the image to be processed by using a pre-training image encoder to obtain target image features; Inputting the text features of the training set and the target image features into a pre-training comparison network, and carrying out feature matching calculation to obtain a target image query result value; Performing similarity calculation on the target image features and training set image features to generate a target similarity matrix, wherein the training set image features are obtained by extracting features of the training image by the pre-training image encoder; based on a first target hyper-parameter, performing nonlinear mapping calculation on the tag and the target similarity matrix to generate a target similarity query result value; based on a second target hyper-parameter, carrying out weighted calculation on the target image query result value and the target similarity query result value to obtain a target classification result value, wherein the target classification result value is used for indicating the detection result of the image to be processed; wherein the first target superparameter and the second target superparameter are obtained by: acquiring initial training set image features, verification set image features and training set text features; Inputting the text features of the training set and the image features of the verification set into the pre-training comparison network, and performing feature matching calculation to obtain a verification image query result value; Performing similarity calculation on the initial training set image features and verification image features to generate a verification similarity matrix; based on the first initial super-parameters, performing nonlinear mapping calculation on the labels of the training images and the verification similarity matrix to generate verification similarity query result values; Based on a second initial super parameter, carrying out weighted calculation on the verification image query result value and the verification similarity query result value to obtain a verification classification result value, wherein the verification classification result value is used for indicating the detection result of the verification image; Determining the accuracy of the first initial super-parameters and the second initial super-parameters according to the verification classification result value and the corresponding labels of the verification images; According to a preset step length, the values of the first initial super-parameters and the second initial super-parameters are adjusted, and the steps of obtaining the initial training set image characteristics, the verification set image characteristics and the training set text characteristics are returned until the values of the first initial super-parameters and the second initial super-parameters exceed a preset super-parameter searching range; and respectively taking the first initial superparameter and the second initial superparameter with the highest accuracy as a first target superparameter and a second target superparameter.
2. The method according to claim 1, wherein the feature extraction of the image to be processed by the pre-training image encoder to obtain the target image feature comprises: Extracting features of the image to be processed by using a pre-training image encoder to obtain a target image feature sequence; and regularizing the target image feature sequence to form target image features.
3. The method of claim 1, wherein the generating a target similarity query result value based on the first target hyper-parameter by performing a nonlinear mapping calculation on the tag and the target similarity matrix comprises: performing single-heat coding on the tag, and converting the single-heat code into single-heat data; And carrying out nonlinear mapping calculation on the single-heat data and the target similarity matrix to generate a target similarity query result value.
4. The method of claim 1, wherein the acquiring initial training set image features, verification set image features, and training set text features comprises: The method comprises the steps of acquiring a training image and a verification image, wherein the training image and the verification image are provided with corresponding labels, and each label corresponds to different training texts; respectively extracting features of the training image and the verification image by using the pre-training image encoder to obtain initial training set image features and verification set image features; And extracting features of the training text by using the pre-training text encoder to obtain training set text features.
5. The method of claim 4, wherein the acquiring training images and verification images comprises: and acquiring a preset image data set, and randomly dividing the preset image data set into a training image and a verification image.
6. An image detection apparatus, comprising: the acquisition module is used for acquiring the image to be processed; the image feature extraction module is used for carrying out feature extraction on the image to be processed by utilizing a pre-training image encoder to obtain target image features; The comparison module is used for inputting the text features of the training set and the target image features into a pre-training comparison network, and carrying out feature matching calculation to obtain a target image query result value; the similarity calculation module is used for calculating the similarity between the target image features and the training set image features to generate a target similarity matrix, wherein the training set image features are obtained by extracting features of the training image by the pre-training image encoder; The mapping module is used for carrying out nonlinear mapping calculation on the label and the target similarity matrix based on the first target super-parameter to generate a target similarity query result value; The weighting module is used for carrying out weighting calculation on the target image query result value and the target similarity query result value based on a second target hyper-parameter to obtain a target classification result value, wherein the target classification result value is used for indicating the detection result of the image to be processed; wherein the first target superparameter and the second target superparameter are obtained by: acquiring initial training set image features, verification set image features and training set text features; Inputting the text features of the training set and the image features of the verification set into the pre-training comparison network, and performing feature matching calculation to obtain a verification image query result value; Performing similarity calculation on the initial training set image features and verification image features to generate a verification similarity matrix; based on the first initial super-parameters, performing nonlinear mapping calculation on the labels of the training images and the verification similarity matrix to generate verification similarity query result values; Based on a second initial super parameter, carrying out weighted calculation on the verification image query result value and the verification similarity query result value to obtain a verification classification result value, wherein the verification classification result value is used for indicating the detection result of the verification image; Determining the accuracy of the first initial super-parameters and the second initial super-parameters according to the verification classification result value and the corresponding labels of the verification images; According to a preset step length, the values of the first initial super-parameters and the second initial super-parameters are adjusted, and the steps of obtaining the initial training set image characteristics, the verification set image characteristics and the training set text characteristics are returned until the values of the first initial super-parameters and the second initial super-parameters exceed a preset super-parameter searching range; and respectively taking the first initial superparameter and the second initial superparameter with the highest accuracy as a first target superparameter and a second target superparameter.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the image detection method according to any one of claims 1 to 5 when executing the program.
8. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the image detection method according to any one of claims 1 to 5.
9. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the image detection method of any one of claims 1 to 5.

Description

Image detection method and device, electronic equipment and storage medium Technical Field The present application relates to the field of data processing technologies, and in particular, to an image detection method, an image detection device, an electronic device, and a storage medium. Background Systems such as monitoring and entrance guard often need to perform image detection on the acquired massive images, screen abnormal images from the acquired massive images, and execute corresponding processing operations. In the prior art, abnormal images can be detected based on a language image comparison network. The language image comparison network is a multi-modal model based on comparison learning, training data of the language image comparison network is a text-image pair, namely an image and text description corresponding to the image, and the language image comparison network can learn a matching relation of the text-image pair through the comparison learning. However, in the training process of the language image comparison network, a large amount of training data is required, and in the actual application scene, only a small amount of data can be acquired for some occasional abnormal situations, so that the current image detection method based on the language image comparison network has weak adaptability to different abnormal situations. Disclosure of Invention In order to solve the technical problems, the application discloses an image detection method, an image detection device, electronic equipment and a storage medium. In a first aspect, the present application shows an image detection method, the method comprising: Collecting an image to be processed; Extracting features of the image to be processed by using a pre-training image encoder to obtain target image features; Inputting the text features of the training set and the target image features into a pre-training comparison network, and carrying out feature matching calculation to obtain a target image query result value; Performing similarity calculation on the target image features and training set image features to generate a target similarity matrix, wherein the training set image features are obtained by extracting features of the training image by the pre-training image encoder; based on a first target hyper-parameter, performing nonlinear mapping calculation on the tag and the target similarity matrix to generate a target similarity query result value; And carrying out weighted calculation on the target image query result value and the target similarity query result value based on a second target hyper-parameter to obtain a target classification result value, wherein the target classification result value is used for indicating the detection result of the image to be processed. Optionally, the extracting the features of the image to be processed by using a pre-training image encoder to obtain the target image features includes: Extracting features of the image to be processed by using a pre-training image encoder to obtain a target image feature sequence; and regularizing the target image feature sequence to form target image features. Optionally, the generating the target similarity query result value by performing nonlinear mapping calculation on the tag and the target similarity matrix based on the first super parameter includes: performing single-heat coding on the tag, and converting the single-heat code into single-heat data; And carrying out nonlinear mapping calculation on the single-heat data and the target similarity matrix to generate a target similarity query result value. Optionally, the performing nonlinear mapping calculation on the tag and the target similarity matrix based on the first target hyper-parameter, before generating the target similarity query result value, includes: acquiring initial training set image features, verification set image features and training set text features; Inputting the text features of the training set and the image features of the verification set into the pre-training comparison network, and performing feature matching calculation to obtain a verification image query result value; Performing similarity calculation on the training set image features and verification image features to generate a verification similarity matrix; based on a first initial super parameter, carrying out nonlinear mapping calculation on the tag and the verification similarity matrix to generate a verification similarity query result value; Based on a second initial super parameter, carrying out weighted calculation on the verification image query result value and the verification similarity query result value to obtain a verification classification result value, wherein the verification classification result value is used for indicating the detection result of the verification image; Determining the accuracy of the first initial super-parameters and the second initial super-parameters according to the verificati