CN-122023330-A - Reference-free image quality evaluation method based on mask reconstruction

CN122023330ACN 122023330 ACN122023330 ACN 122023330ACN-122023330-A

Abstract

The invention discloses a mask reconstruction-based reference-free image quality evaluation method, and belongs to the technical field of image processing and computer vision. The method comprises the steps of constructing a mixed synthesized image containing a distortion area and an undistorted area, introducing local controllable distortion under the condition of keeping global semantics consistent, extracting global semantic features insensitive to distortion by using a semantic perception encoder based on mask reconstruction on the basis, extracting distortion features reflecting the degradation degree of the image by using a distortion perception encoder under the constraint of the semantic features, introducing a local degradation perception encoder to obtain fine-grained local degradation features, and finally dynamically fusing multiple types of features by using a self-adaptive information fusion module to output an image quality evaluation result. The invention can realize image quality evaluation without reference images, can effectively improve the consistency between the prediction result and subjective perception of human beings, and has good robustness and generalization capability.

Inventors

SHEN LILI
SUN XUJIE
CAO TIANYU
WANG CHAOXIA

Assignees

天津大学

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (7)

1. The reference-free image quality evaluation method based on mask reconstruction is characterized by comprising the following steps of: S1, constructing mixed synthetic data: Obtaining a reference image and a distorted image corresponding to the reference image, dividing the image into a plurality of image blocks, determining the source of each image block according to a space mask, enabling part of the image blocks to come from the distorted image, enabling the rest of the image blocks to come from the reference image, performing space position alignment and splicing on the image blocks from different sources, and constructing a mixed synthetic image simultaneously comprising a distorted region and an undistorted region; s2, extracting semantic perception features: Masking the input image, inputting the masked image into a semantic perception encoder, training through a masking image reconstruction task, and learning global semantic features insensitive to image distortion; s3, extracting distortion perception characteristics: Under the condition that parameters of a semantic perception encoder are kept fixed, inputting image blocks corresponding to a distortion area in a mixed synthetic image into the distortion perception encoder, and outputting distortion characteristics reflecting the degradation degree and degradation type of the image under the constraint of semantic characteristics; s4, extracting local degradation characteristics: Inputting the image to be evaluated into a local degradation perception encoder, and extracting local features reflecting local texture change, edge blurring and fine granularity degradation information; s5, multi-feature self-adaptive fusion and quality prediction: And carrying out self-adaptive fusion on semantic features, distortion features and local features, dynamically balancing the contribution of various features, and outputting an image quality evaluation result.
2. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: The spatial mask in S1 is a binary mask used for identifying that each image block is derived from a reference image or a distorted image, so that local controllable distortion is introduced under the condition of consistent global semantics.
3. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: The semantic perception encoder in S2 adopts an encoding structure based on a converter, and trains through reconstructing a mask image reconstruction task of a corresponding reference image so as to enhance modeling capability of the image structure and semantic information, wherein the training target of the semantic perception encoder can be expressed as: In the formula, A distorted image is represented and is displayed, A corresponding reference image is represented and is displayed, Representing the masking operation, SAE represents the semantic encoder and IRD represents the reconstruction decoder.
4. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: And S3, introducing semantic features extracted by the frozen semantic perception encoder as guiding information in the training process of the distortion perception encoder so as to inhibit interference of the semantic information on distortion feature learning.
5. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: and S4, modeling a local area of the image through a local receptive field by adopting a convolutional neural network structure by the local degenerate perceptual coder.
6. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: And S5, performing dynamic weighted fusion on semantic features, distortion features and local features by adopting an attention mechanism, a gating mechanism or a combination mode thereof.
7. The mask reconstruction-based no-reference image quality evaluation method according to claim 1, wherein: The image quality evaluation result is output through a regression model and used for representing the subjective visual quality level of an image, and a prediction result formula of the quality evaluation result is expressed as follows: In the formula, Representing the characteristics of the local degradation, As a gate-control parameter that can be learned, Representing a quality regression module.

Description

Reference-free image quality evaluation method based on mask reconstruction Technical Field The invention belongs to the technical field of image processing and computer vision, and particularly relates to an image quality evaluation method based on semantic-distortion collaborative learning and multi-feature self-adaptive fusion of mask image reconstruction. Background With the wide application of digital images in the internet, multimedia communication, intelligent terminals and visual perception systems, image quality directly affects information transfer efficiency and user experience. The image is susceptible to various distortions such as noise, blurring, compression artifacts, etc. during the processes of acquisition, compression, storage and transmission, and therefore objective evaluation of image quality is required. The existing image quality evaluation method mainly comprises three types of full reference, half reference and no reference. The reference-free image quality evaluation method does not need reference images, and meets the actual application requirements better. The existing reference-free method depends on statistical features or visual features extracted by a depth network, but under a complex scene, semantic content and distortion features are often mutually coupled, so that the model is difficult to distinguish image content from quality degradation factors, and evaluation accuracy and generalization capability are further affected. In recent years, some methods introduce a pre-training model or a reconstruction task to enhance semantic understanding or distortion perception capability, but most of the methods cannot explicitly restrict decoupling and cooperation relation between semantic features and distortion features in a training stage, and meanwhile, the modeling capability of local fine granularity degradation features is insufficient, so that the perception mechanism of a human visual system on image quality is difficult to be comprehensively reflected. Therefore, a method for simultaneously modeling semantic information and distortion information without reference and improving accuracy and robustness of image quality evaluation through collaborative learning and adaptive fusion is needed. Disclosure of Invention 1. The invention aims to solve the technical problems: The invention aims to provide a semantic-distortion collaborative learning reference-free image quality evaluation method based on mask image reconstruction, which aims to solve the following problems in the prior art: (1) The existing method has no problem of mutual coupling of semantic features and distortion features in the reference-free image quality evaluation method, and is difficult to distinguish image content and quality degradation factors, so that evaluation accuracy is affected. (2) The existing method has the defect of insufficient modeling of local fine granularity degradation characteristics, the existing technology focuses on global characteristic extraction in multiple modes, fine granularity degradation conditions such as local texture change and edge blurring of an image are difficult to comprehensively reflect, and the local degradation perception encoder is introduced to strengthen characteristic capture of the dimension. 2. In order to achieve the above purpose, the present invention adopts the following technical scheme: a reference-free image quality evaluation method based on mask reconstruction comprises the following steps: S1, constructing mixed synthetic data, namely acquiring a reference image and a distorted image corresponding to the reference image, dividing the image into a plurality of image blocks, randomly selecting part of the image blocks from the distorted image, adaptively fusing other image blocks from the reference image through a spatial mask, and constructing a mixed synthetic image data set simultaneously comprising a distorted region and an undistorted region; S2, extracting semantic perception features, carrying out mask processing on an input image, inputting the masked image into a semantic perception encoder, and learning global semantic features insensitive to distortion through a mask image reconstruction task; s3, extracting distortion perception features, namely inputting a distortion region in the mixed synthesized image into a distortion perception encoder under the condition of keeping the parameters of the semantic perception encoder fixed, and extracting distortion related features by reconstructing the distorted image under the guidance of the semantic features. S4, extracting local degradation characteristics, namely inputting an original image into a local degradation perception encoder, and extracting local characteristics reflecting local texture, edges and fine granularity degradation information; S5, multi-feature self-adaptive fusion and quality prediction are carried out, semantic features, distortion features and local features are subject