CN-115761581-B - Image and video quality recognition method, model training method, device and equipment

CN115761581BCN 115761581 BCN115761581 BCN 115761581BCN-115761581-B

Abstract

The disclosure provides an image, a video quality recognition method, a model training method, a device, equipment, a storage medium and a computer program product, and relates to the technical field of data processing, in particular to the field of video quality recognition and image quality recognition. The specific implementation scheme of the video quality recognition method comprises the steps of obtaining a video to be recognized, extracting at least two frames of images to be recognized from the video to be recognized, carrying out equal-proportion scaling treatment on the images to be recognized of each frame to obtain size conversion images corresponding to the images to be recognized of each frame, extracting image features from the images to be recognized and the corresponding size conversion images respectively, fusing the extracted image features to obtain video features of the video to be recognized, and recognizing the video quality of the video to be recognized according to the video features.

Inventors

CUI DONGLIN

Assignees

百度在线网络技术（北京）有限公司

Dates

Publication Date: 20260508
Application Date: 20221114

Claims (20)

1. A method of identifying video quality, comprising: Acquiring a video to be identified, and extracting at least two frames of images to be identified from the video to be identified; performing equal-proportion scaling treatment on the images to be identified of each frame to obtain size conversion images corresponding to the images to be identified of each frame; extracting image features from the images to be identified and the corresponding size conversion images respectively, and fusing the extracted image features from all the images to be identified and the corresponding size conversion images to obtain video features of the video to be identified; and identifying the video quality of the video to be identified according to the video characteristics.
2. The method for identifying video quality according to claim 1, wherein, Extracting image features from the image to be identified comprises the following steps: Performing image blocking processing on the image to be identified to obtain at least two first image blocks; Mapping each first image block into a first image vector respectively, and mapping the position information of each first image block into a first position vector respectively; Splicing a first image vector and a first position vector of the first image block to obtain an image block vector of the first image block; the image characteristics of the image to be identified are determined according to the image block vector of the first image block; and/or extracting image features from the size-transformed image, comprising: performing image blocking processing on the size-converted image to obtain at least two second image blocks; Mapping each second image block into a second image vector respectively, and mapping the position information of each second image block into a second position vector respectively; Splicing a second image vector and a second position vector of the second image block to obtain an image block vector of the second image block; the image characteristics of the size transformed image are determined from the image block vector of the second image block.
3. The method for identifying video quality according to claim 1, wherein said fusing the extracted image features to obtain video features of the video to be identified comprises: Inputting all image features extracted from the image to be identified and the corresponding size conversion image into a first encoder to obtain image fusion features of the image to be identified; and inputting the image fusion characteristics of all the images to be identified into a second encoder to obtain the video characteristics of the video to be identified.
4. A method of identifying video quality according to any of claims 1-3, wherein the identifying video quality of the video to be identified according to the video characteristics comprises: identifying the definition of the video to be identified according to the video characteristics; and/or identifying the aesthetic degree of the video to be identified according to the video characteristics.
5. The method for identifying video quality according to claim 4, wherein the scaling the image to be identified for each frame comprises: Performing equal-proportion amplification processing on the image to be identified according to a first scaling ratio to obtain an amplified size conversion image; and/or performing equal-proportion reduction processing on the graph to be identified according to a second scaling ratio to obtain a reduced size conversion image.
6. A training method of a video quality recognition model, comprising: obtaining a video sample marked with video quality marking information, and extracting at least two frames of image samples from the video sample; Performing equal-scale scaling on each frame of image sample to obtain a size conversion image sample corresponding to each frame of image sample; Inputting the image samples and the corresponding size-converted image samples into a video quality recognition model to output video quality prediction information of the video samples according to video characteristics by the video quality recognition model, wherein the video characteristics are obtained by carrying out fusion processing on the image characteristics extracted from all the image samples and the corresponding size-converted image samples of the video samples; And calculating a loss error according to the video quality prediction information and the video quality annotation information, and adjusting model parameters of the video quality identification model according to the loss error.
7. The method of training a video quality recognition model of claim 6, wherein said inputting the image samples and corresponding size-transformed image samples into a video quality recognition model comprises: Performing image blocking processing on the image sample to obtain at least two first image blocks, and performing image blocking processing on the size-converted image sample to obtain at least two second image blocks; the at least two first image blocks and the at least two second image blocks are both input into the video quality recognition model.
8. The method of training a video quality recognition model of claim 7, wherein the video quality recognition model comprises a fully connected layer, a first encoder, a second encoder, and an output layer; the full connection layer is used for mapping each first image block and each second image block into image block vectors respectively; the first encoder is used for extracting image features from the image samples and the size-converted image, and fusing the extracted image features to obtain image fusion features of the image samples; the second encoder is used for fusing the image fusion characteristics of all the image samples extracted from the video samples to obtain the video characteristics of the video samples; the output layer is used for outputting the video quality prediction information according to the video characteristics.
9. The method for training a video quality recognition model according to any one of claims 6 to 8, wherein the performing the scaling process on each frame of image samples includes: performing equal-proportion amplification processing on the image sample according to a first scaling ratio to obtain an amplified size conversion image; And/or performing equal-scale reduction processing on the image sample according to a second scaling ratio to obtain a reduced size conversion image.
10. A method of identifying image quality, comprising: acquiring an image to be identified; Performing equal-proportion scaling treatment on an image to be identified to obtain a size conversion image of the image to be identified; extracting image features from the image to be identified and the corresponding size conversion image respectively, and fusing the extracted image features to obtain image fusion features of the image to be identified; and identifying the image quality of the image to be identified according to the image fusion characteristics.
11. The method for recognizing image quality according to claim 10, wherein, Extracting image features from the image to be identified comprises the following steps: performing image blocking processing on the image to be identified to obtain a plurality of first image blocks; Mapping each first image block into a first image vector respectively, and mapping the position information of each first image block into a first position vector respectively; Splicing a first image vector and a first position vector of the first image block to obtain an image block vector of the first image block; the image characteristics of the image to be identified are determined according to the image block vector of the first image block; And/or extracting image features from the size-transformed image comprises: performing image blocking processing on the size-converted image to obtain a plurality of second image blocks; Mapping each second image block into a second image vector respectively, and mapping the position information of each second image block into a second position vector respectively; And the second image vector of the second image block and the second position vector are spliced to obtain an image block vector of the second image block, and the second image characteristic of the size conversion image is determined according to the image block vector of the second image block.
12. The method for identifying image quality according to claim 10, wherein said fusing the extracted image features to obtain image fusion features of the image to be identified comprises: and inputting the extracted image features into a first encoder to obtain the image fusion features of the image to be identified.
13. The method for identifying image quality according to any one of claims 10 to 12, wherein the identifying the image quality of the image to be identified according to the image feature includes: Identifying the definition of the image to be identified according to the image characteristics; and/or identifying the aesthetic degree of the image to be identified according to the image characteristics.
14. The method for recognizing image quality according to claim 13, wherein the performing an equal-scale scaling process on the image to be recognized comprises: Performing equal-proportion amplification processing on the image to be identified according to a first scaling ratio to obtain an amplified size conversion image; and/or performing equal-proportion reduction processing on the graph to be identified according to a second scaling ratio to obtain a reduced size conversion image.
15. A training method of an image quality recognition model, comprising: acquiring an image sample marked with image quality marking information; performing equal-proportion scaling on each image sample to obtain a size-converted image sample corresponding to each image sample; Inputting the image sample and the corresponding size-converted image sample into an image quality recognition model to output image quality prediction information by the image quality recognition model according to image fusion characteristics of the image sample, wherein the image fusion characteristics are obtained by fusion processing of image characteristics extracted from the image sample and the corresponding size-converted image; and calculating a loss error according to the image quality prediction information and the image quality annotation information, and adjusting model parameters of the image quality identification model according to the loss error.
16. The method of training an image quality recognition model of claim 15, wherein the inputting the image samples and corresponding size-transformed image samples into an image quality recognition model comprises: Performing image blocking processing on the image sample to obtain a plurality of first image blocks, and performing image blocking processing on the size-converted image sample to obtain a plurality of second image blocks; The plurality of first image blocks and the plurality of second image blocks are each input into the image quality recognition model.
17. The method of training an image quality recognition model of claim 16, wherein the image quality recognition model comprises a full connection layer, an encoder, and an output layer; The full connection layer is used for mapping the first image block and the second image block into image block vectors respectively; The encoder is used for extracting image block features from the image block vectors and fusing the image block features to obtain the image fusion features; The output layer is used for outputting the image quality prediction information of the image sample according to the image fusion characteristics.
18. The method for training an image quality recognition model according to any one of claims 15 to 17, wherein the performing the equal scaling process on each image sample includes: performing equal-proportion amplification processing on the image sample according to a first scaling ratio to obtain an amplified size conversion image; And/or performing equal-scale reduction processing on the image sample according to a second scaling ratio to obtain a reduced size conversion image.
19. An apparatus for identifying video quality, comprising: The video acquisition module is used for acquiring a video to be identified and extracting at least two frames of images to be identified from the video to be identified; The scaling module is used for carrying out equal-proportion scaling treatment on the images to be identified of each frame to obtain a size conversion image corresponding to the images to be identified of each frame; The feature extraction module is used for extracting image features from the images to be identified and the corresponding size conversion images respectively, and fusing the extracted image features from all the images to be identified and the corresponding size conversion images to obtain video features of the video to be identified; and the quality identification module is used for identifying the video quality of the video to be identified according to the video characteristics.
20. The apparatus for identifying video quality according to claim 19, wherein, When extracting image features from the image to be identified, the feature extraction module is specifically configured to: Performing image blocking processing on the image to be identified to obtain at least two first image blocks; Mapping each first image block into a first image vector respectively, and mapping the position information of each first image block into a first position vector respectively; Splicing a first image vector and a first position vector of the first image block to obtain an image block vector of the first image block; the image characteristics of the image to be identified are determined according to the image block vector of the first image block; and/or, when extracting image features from the size-converted image, the feature extraction module is specifically configured to: performing image blocking processing on the size-converted image to obtain at least two second image blocks; Mapping each second image block into a second image vector respectively, and mapping the position information of each second image block into a second position vector respectively; Splicing a second image vector and a second position vector of the second image block to obtain an image block vector of the second image block; the image characteristics of the size transformed image are determined from the image block vector of the second image block.

Description

Image and video quality recognition method, model training method, device and equipment Technical Field The present disclosure relates to the field of data processing technologies, and in particular, to the field of video quality recognition and image quality recognition. Background In the prior art, for video quality identification, images are generally extracted from a video, the images are unified to a fixed size, a convolutional neural network is input, the image quality scores of the images are respectively identified by the convolutional neural network, the average value of the image quality scores of all the images is used as the image quality score of the video, and the accuracy of the identification result is limited. Disclosure of Invention The present disclosure provides an image, video quality recognition method, model training method, apparatus, device, storage medium, and computer program product. According to a first aspect of the present disclosure, there is provided a method for identifying video quality, including: Acquiring a video to be identified, and extracting at least two frames of images to be identified from the video to be identified; performing equal-proportion scaling treatment on the images to be identified of each frame to obtain size conversion images corresponding to the images to be identified of each frame; Extracting image features from the image to be identified and the corresponding size conversion image respectively, and fusing the extracted image features to obtain video features of the video to be identified; and identifying the video quality of the video to be identified according to the video characteristics. According to a second aspect of the present disclosure, there is provided a training method of a video quality recognition model, including: obtaining a video sample marked with video quality marking information, and extracting at least two frames of image samples from the video sample; Performing equal-scale scaling on each frame of image sample to obtain a size conversion image sample corresponding to each frame of image sample; Inputting the image sample and the corresponding size-converted image sample into a video quality recognition model to output video quality prediction information of the video sample according to video characteristics by the video quality recognition model, wherein the video characteristics are obtained by fusion processing of the image characteristics extracted from the image sample and the corresponding size-converted image sample; And calculating a loss error according to the video quality prediction information and the video quality annotation information, and adjusting model parameters of the video quality identification model according to the loss error. According to a third aspect of the present disclosure, there is provided an image quality recognition method including: acquiring an image to be identified; Performing equal-proportion scaling treatment on an image to be identified to obtain a size conversion image of the image to be identified; extracting image features from the image to be identified and the corresponding size conversion image respectively, and fusing the extracted image features to obtain image fusion features of the image to be identified; and identifying the image quality of the image to be identified according to the image fusion characteristics. According to a fourth aspect of the present disclosure, there is provided a training method of an image quality recognition model, including: acquiring an image sample marked with image quality marking information; performing equal-proportion scaling on each image sample to obtain a size-converted image sample corresponding to each image sample; Inputting the image sample and the corresponding size-converted image sample into an image quality recognition model to output image quality prediction information by the image quality recognition model according to image fusion characteristics of the image sample, wherein the image fusion characteristics are obtained by fusion processing of image characteristics extracted from the image sample and the corresponding size-converted image; and calculating a loss error according to the image quality prediction information and the image quality annotation information, and adjusting model parameters of the image quality identification model according to the loss error. According to a fifth aspect of the present disclosure, there is provided a video quality recognition apparatus, comprising: The video acquisition module is used for acquiring a video to be identified and extracting at least two frames of images to be identified from the video to be identified; The scaling module is used for carrying out equal-proportion scaling treatment on the images to be identified of each frame to obtain a size conversion image corresponding to the images to be identified of each frame; the feature extraction module is used for extracting im