CN-115937918-B - Image classification method, device and equipment

CN115937918BCN 115937918 BCN115937918 BCN 115937918BCN-115937918-B

Abstract

The application provides an image classification method, device and equipment, wherein the method comprises the steps of identifying whether an image to be processed is a beauty image or not through a face detection model and a beauty identification model which are trained in advance; classifying the images to be processed through a pre-trained multi-category classification model to respectively obtain the probability that the images to be processed belong to each image category, and identifying whether the images to be processed comprise images of target persons or not through a pre-trained person classification model. The present application integrates a beauty recognition model for recognizing a beauty image, a multi-category classification model for classifying a plurality of image categories, a person classification model for recognizing whether a target person is included in an image, and the like. The images to be processed are automatically classified through the models, so that the real-time performance is high, the automatic classification of a large number of images can be realized, and the efficiency of image data structuring processing is improved.

Inventors

JIANG XIAOLONG
YANG HAOJIE

Assignees

北京新氧科技有限公司

Dates

Publication Date: 20260505
Application Date: 20210930

Claims (19)

1. An image classification method, comprising: identifying whether the image to be processed is a beauty image or not through a face detection model and a beauty identification model which are trained in advance; Classifying the images to be processed through a pre-trained multi-category classification model to respectively obtain the probability that the images to be processed belong to each image category; Identifying whether the image to be processed comprises an image of a target person or not through a pre-trained person classification model; Wherein after the probabilities that the image to be processed belongs to each image category are obtained respectively, the method further comprises: Determining that the probability that the image to be processed belongs to a third image category is larger than a preset probability threshold corresponding to the third image category, wherein the third image category comprises spliced images; Inputting the image to be processed into a trained recognition model, and extracting the characteristics of the image to be processed through a characteristic extraction network in the recognition model to obtain a characteristic diagram; based on the feature map, predicting the position of a stitching line in the image to be processed through a stitching line prediction network in the identification model; and judging whether the image to be processed is a spliced image containing target content or not through a classification network in the identification model based on the feature map and the splice line position.
2. The method according to claim 1, wherein the identifying whether the image to be processed is a beauty image by the face detection model and the beauty recognition model trained in advance includes: Determining whether a face region meeting preset face classification conditions exists in the image to be processed or not through a pre-trained face detection model; If yes, a face area meeting the preset face classification condition is intercepted from the image to be processed, and a corresponding face image is obtained; Generating a face alignment image corresponding to the face image; and recognizing whether the face alignment image is a beauty image or not through a pre-trained beauty recognition model.
3. The method according to claim 1 or 2, characterized in that the method further comprises: sequentially connecting a preset number of convolution modules; connecting the last convolution module in the preset number of convolution modules which are sequentially connected with a preset efficient network to obtain a structure of a beauty identification model for identifying a beauty image; Acquiring a first training set; And training the beauty recognition model according to the first training set.
4. The method of claim 3, wherein the acquiring the first training set comprises: Acquiring face images of a plurality of beauty faces and face images of a plurality of plain faces; respectively generating face alignment images corresponding to each acquired face image; labeling a face-beautifying label in each face-beautifying alignment image, labeling a plain-face label in each plain-face-aligning image to obtain a first training set, and/or, And labeling the face beautifying label on the face part of the face beautifying person in the face alignment image to obtain a first training set.
5. A method according to claim 3, wherein training the face recognition model according to the first training set comprises: acquiring a face alignment image from the first training set; inputting the obtained face alignment image into the face beautifying recognition model to obtain a classification result of the face image corresponding to the face alignment image; And calculating the loss value of the current training period through a preset classification-center loss function according to the classification result of the face image.
6. The method of claim 5, wherein the classification result includes probabilities that face images belong to respective image categories, the respective image categories including at least a face-beautifying image; According to the classification result of the face image, calculating the loss value of the current training period through a preset classification-center loss function, including: According to the number of the image categories and the probability that the face image belongs to each image category, calculating a first classification loss value corresponding to the face image through a first classification loss function included in a preset classification-center loss function; calculating a center loss value corresponding to the face image through a center loss function included by the classification-center loss function according to the feature vector and the feature center point of each image category corresponding to the face image; And calculating the loss value of the current training period according to the first classification loss value and the central loss value.
7. The method of claim 6, wherein the classification result further comprises a probability that an image of at least one face region in the face image belongs to each image class; According to the classification result of the face image, calculating the loss value of the current training period through a preset classification-center loss function, including: calculating a second classification loss value corresponding to the face image through a second classification loss function included in the classification-center loss function according to the number of the at least one face part, the number of the image categories and the probability that the image of each face part belongs to each image category; And calculating the loss value of the current training period according to the first classification loss value, the second classification loss value and the central loss value.
8. The method of claim 5, wherein the classification result includes probabilities that images of at least one face region in the face image belong to respective image categories; According to the classification result of the face image, calculating the loss value of the current training period through a preset classification-center loss function, including: calculating a second classification loss value corresponding to the face image through a second classification loss function included in the classification-center loss function according to the number of the at least one face part, the number of the image categories and the probability that the image of each face part belongs to each image category; calculating a center loss value corresponding to the face image through a center loss function included by the classification-center loss function according to the feature vector and the feature center point of each image category corresponding to the face image; And calculating the loss value of the current training period according to the second classification loss value and the central loss value.
9. The method of claim 1, wherein before classifying the image to be processed by the pre-trained multi-class classification model, further comprising: Acquiring a second training set, wherein the second training set comprises a plurality of images, and each image is marked with a category label corresponding to each image category; connecting the output end of a preset high-efficiency network with the input end of a two-way long-short-term memory network to obtain a first neural network model for multi-label classification; And training the constructed first neural network model according to the second training set to obtain a multi-category classification model.
10. The method according to claim 9, wherein after training the constructed first neural network model according to the second training set to obtain a multi-class classification model, further comprising: Determining the accuracy of the multi-category classification model in classifying each image category respectively; If a first image category with the accuracy lower than a preset threshold exists, training a branch model corresponding to the first image category; and correcting the multi-category classification model by using the branch model to obtain a corrected multi-category classification model.
11. The method according to claim 10, wherein the classifying the image to be processed by a pre-trained multi-class classification model, respectively obtaining probabilities that the image to be processed belongs to each image class, comprises: Extracting the feature vector of the image to be processed through a preset high-efficiency network in the corrected multi-category classification model; classifying the feature vectors through the branch model in the corrected multi-category classification model to obtain a first probability that the image to be processed belongs to a first image category; classifying the feature vectors through a two-way long-short-term memory network in the corrected multi-class classification model to respectively obtain the probability that the image to be processed belongs to each image class, wherein the probability comprises the second probability that the image to be processed belongs to the first image class; and fusing the first probability and the second probability to obtain the final probability that the image to be processed belongs to the first image category.
12. The method according to any one of claims 1, 9-11, wherein after obtaining the probabilities that the images to be processed belong to the respective image categories, respectively, further comprises: Determining that the probability that the image to be processed belongs to a second image category is larger than a preset probability threshold corresponding to the second image category, wherein the second image category is an image category comprising a body part; Detecting whether the image to be processed contains a face area meeting preset face classification conditions or not through the face detection model; if the image to be processed contains a face area meeting the preset face classification condition, recognizing whether the image of the face area in the image to be processed is a beauty image or not through a beauty recognition model; and if the image of the face area in the image to be processed is determined to be the beauty image, determining that the image to be processed does not belong to the second image category.
13. The method according to claim 1, wherein predicting, based on the feature map, a splice line position in the image to be processed through a splice line prediction network in the recognition model, comprises: Acquiring a first intermediate feature, a second intermediate feature and a third intermediate feature generated in the feature extraction process of the feature extraction network through the stitching line prediction network; Performing feature fusion on the feature map, the first intermediate feature, the second intermediate feature and the third intermediate feature through a first feature fusion module in the stitching line prediction network and outputting the feature map, the first intermediate feature, the second intermediate feature and the third intermediate feature; predicting the position of a transverse stitching line in the image to be processed by using the fused features output by the first feature fusion module through a transverse stitching line prediction module; And predicting the position of a longitudinal stitching line in the image to be processed by using the fused features output by the first feature fusion module through the longitudinal stitching line prediction module.
14. The method of claim 13, wherein the lateral stitching line prediction module predicts a lateral stitching line position in the image to be processed using the fused features output by the first feature fusion module, comprising: Separating the fused features by a first separation module in the transverse splice line prediction module according to columns, fusing a plurality of column features obtained by separation according to channel dimensions to obtain a column feature, and inputting the column feature to a first processing module in the transverse splice line prediction module; The first processing module performs multi-channel dimension reduction processing on one column of features to obtain transverse stitching line features, and the transverse stitching line features are input to a first output layer in the transverse stitching line prediction module; and determining a transverse stitching line position in the image based on the transverse stitching line characteristics through the first output layer.
15. The method of claim 13, wherein the longitudinal stitching line prediction module predicts a longitudinal stitching line position in the image using the fused features output by the first feature fusion module, comprising: Separating the fused features by a second separation module in the longitudinal splice line prediction module according to the rows, fusing a plurality of separated row features according to the channel dimension to obtain a row feature, and inputting the row feature to a second processing module in the longitudinal splice line prediction module; The second processing module performs multiple channel dimension reduction processing on one row characteristic to obtain a longitudinal splice line characteristic, and the longitudinal splice line characteristic is input to a second output layer in the longitudinal splice line prediction module; And determining a longitudinal stitching line position in the image based on the longitudinal stitching line features through the second output layer.
16. The method according to claim 1, wherein the determining, based on the feature map and the stitching line position, by a classification network in the recognition model, whether the image to be processed is a stitched image containing target content, comprises: the feature map and the stitching line feature are subjected to feature fusion through a second feature fusion module in the classification network, and the fused feature is input into a third processing module in the classification network; Performing multiple channel dimension reduction processing on the fused features through the third processing module to obtain features with preset channel dimensions, and inputting the features into a third output layer in the classification network; and judging whether the image is a spliced image containing target content or not based on the characteristics of the preset channel dimension through the third output layer.
17. The method according to any one of claims 1, 9-11, wherein after obtaining the probabilities that the images to be processed belong to the respective image categories, respectively, further comprises: determining that the probability that the image to be processed belongs to a fourth image category is larger than a preset probability threshold corresponding to the fourth image category, wherein the fourth image category is an image category comprising characters; recognizing character information in the image to be processed through an optical character recognition model; Word segmentation is carried out on the text information to obtain one or more keywords; determining whether a preset dictionary library contains at least one keyword; and if so, determining that the image to be processed belongs to a fourth image category containing target content.
18. An image classification apparatus, comprising: the beauty recognition module is used for recognizing whether the image to be processed is a beauty image or not through a face detection model and a beauty recognition model which are trained in advance; The multi-category classification module is used for classifying the images to be processed through a pre-trained multi-category classification model to respectively obtain the probability that the images to be processed belong to each image category; The target person identification module is used for identifying whether the image of the target person is included in the image to be processed or not through a pre-trained person classification model; Wherein after the probabilities that the image to be processed belongs to each image category are obtained respectively, the method further comprises: Determining that the probability that the image to be processed belongs to a third image category is larger than a preset probability threshold corresponding to the third image category, wherein the third image category comprises spliced images; Inputting the image to be processed into a trained recognition model, and extracting the characteristics of the image to be processed through a characteristic extraction network in the recognition model to obtain a characteristic diagram; based on the feature map, predicting the position of a stitching line in the image to be processed through a stitching line prediction network in the identification model; and judging whether the image to be processed is a spliced image containing target content or not through a classification network in the identification model based on the feature map and the splice line position.
19. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor runs the computer program to implement the method of any one of claims 1-17.

Description

Image classification method, device and equipment Technical Field The application belongs to the technical field of image processing, and particularly relates to an image classification method, device and equipment. Background At present, a large number of images are generated in a network every day, and services such as text image searching and image searching can be provided for users based on the large number of images. Before providing these services, the image data needs to be structured by first classifying the image. In the related art, the images uploaded by the user are classified manually, which is time-consuming and labor-consuming. Or the one-to-one correspondence between the uploaded image and the related classification label is also specified in the related technology so as to guide the user to upload the image according to the classification label. However, there are many users who upload images without specification, resulting in poor accuracy in image structuring. Disclosure of Invention The application provides an image classification method, device and equipment, which integrate a beauty recognition model for recognizing a beauty image, a multi-category classification model for classifying various image categories and a person classification model for recognizing whether a target person is contained in the image. The images to be processed are automatically classified through the models, so that the real-time performance is high, the automatic classification of a large number of images can be realized, and the efficiency of image data structuring processing is improved. An embodiment of a first aspect of the present application provides an image classification method, including: identifying whether the image to be processed is a beauty image or not through a face detection model and a beauty identification model which are trained in advance; Classifying the images to be processed through a pre-trained multi-category classification model to respectively obtain the probability that the images to be processed belong to each image category; And identifying whether the image to be processed comprises the image of the target person or not through a pre-trained person classification model. In some embodiments of the present application, the identifying whether the image to be processed is a beauty image by the face detection model and the beauty recognition model trained in advance includes: Determining whether a face region meeting preset face classification conditions exists in the image to be processed or not through a pre-trained face detection model; if yes, recognizing whether the image of the face area meeting the preset face classification condition is a beauty image or not through a pre-trained beauty recognition model. In some embodiments of the present application, the determining, by using a pre-trained face detection model, whether a face region satisfying a preset face classification condition exists in the image to be processed includes: detecting whether the image to be processed contains at least one face area or not through a pre-trained face detection model; If yes, determining whether the at least one face area contains a target face area with the face area occupying ratio larger than a preset threshold value; If the target face area exists, determining that the face area meeting the preset face classification condition exists in the image to be processed. In some embodiments of the present application, the determining whether the at least one face region includes a target face region having a face area occupation ratio greater than a preset threshold includes: Detecting face key points corresponding to each face area; the area of each face area is respectively determined according to the face key points corresponding to each face area; According to the area of each face area and the area of the image to be processed, respectively calculating the face area occupation ratio corresponding to each face area; And determining whether the at least one face region contains a target face region with the face area ratio larger than a preset threshold according to the face area ratio corresponding to each face region. In some embodiments of the present application, the identifying, by a pre-trained face recognition model, whether the image of the face region satisfying the preset face classification condition is a face-beautifying image includes: a face region meeting the preset face classification condition is intercepted from the image to be processed, and a corresponding face image is obtained; Generating a face alignment image corresponding to the face image; and recognizing whether the face alignment image is a beauty image or not through a pre-trained beauty recognition model. In some embodiments of the present application, the generating a face alignment image corresponding to the face image includes: scaling the face image into a first image of a preset size; Acquiring a plural