CN-121305450-B - Video dynamic quality evaluation and analysis method based on perception and memory

CN121305450BCN 121305450 BCN121305450 BCN 121305450BCN-121305450-B

Abstract

The invention discloses a video dynamic quality evaluation and analysis method based on perception and memory, which relates to the technical field of video quality analysis and comprises the steps of decomposing a video to be evaluated into a continuous frame sequence, removing abnormal frames, dividing dynamic and static areas, dividing a single frame into three types of high, medium and low sensitivity areas based on sensitivity, recording area characteristics, calculating the fluency score, definition score and color consistency score of each sensitivity area, weighting to obtain a single frame real-time perception quality total score, establishing an initial quality anchor point, storing the initial quality anchor point in a long-term memory pool, dynamically updating quality data through an instant memory pool and a short-term memory pool, calculating weighted average score and abnormal event accumulation times, dynamically adjusting weights according to the memory pool data, correcting the current frame score, outputting three-level nine-grade evaluation results, comparing the three-level nine-grade evaluation results with reference scores, and optimizing key parameters, thereby remarkably improving the accuracy and robustness of video quality evaluation.

Inventors

Jin Bingjian
WANG QING
LIANG YAOMING
LIU SHUREN

Assignees

陕西凌丰泰电子科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251203

Claims (7)

1. A video dynamic quality evaluation and analysis method based on perception and memory is characterized by comprising the following steps: Decomposing the video to be evaluated into a continuous frame sequence, removing abnormal frames, dividing dynamic and static areas, dividing a single frame into three types of high, medium and low sensitivity areas based on sensitivity, and recording area characteristics; Calculating the fluency score, definition score and color consistency score of each sensitivity region, and weighting to obtain a single-frame real-time perception quality total score; establishing an initial mass anchor point and storing the initial mass anchor point into a long-term memory pool, dynamically updating mass data through an instant memory pool and a short-term memory pool, and calculating a weighted average score and the cumulative number of abnormal events, wherein the method comprises the following steps: Selecting the real-time perceived quality score of the N frames before the video, sorting the N frames according to descending order, taking the first 3 to 5 scores as an initial quality anchor point, storing the initial quality anchor point into a long-term memory pool, setting the capacity upper limit of the long-term memory pool, removing anchor point data with the lowest score when the capacity of the long-term memory pool exceeds the upper limit, recording the frame number, the total score and the dimension score of the anchor point, creating an instant memory pool and a short-term memory pool, respectively setting the capacity and an updating period, storing the real-time perceived quality total score, the dimension score, the abnormal event type and the occurrence area of each frame into the instant memory pool, periodically transferring the instant memory pool data into the short-term memory pool, calculating the weighted average score and the abnormal event accumulation number in the short-term memory pool to form a short-term quality feature vector, setting a triggering condition, accumulating and updating the short-term memory pool for 3 times, or detecting serious quality abnormality, namely, a single-frame total score <3 score, extracting the core feature of the short-term memory pool from the short-term quality feature vector when the triggering condition is met, including the lowest score, the abnormal event peak number and quality trend slope, calculating the deviation rate of the core feature and the initial anchor point feature of the long-term memory pool, and storing the deviation of the core feature into the short-term memory pool if the deviation is the core feature, and the deviation rate of the long-term feature is stored into the threshold; The method comprises the steps of dynamically adjusting weights according to long-term memory pool data, correcting current frame scores, setting a three-level nine-gear evaluation system, dividing the scores into three excellent, good and qualified levels, dividing each level into upper, middle and lower three levels, calculating weighted average of all video frame correction scores, combining abnormal marks of the long-term memory pool, outputting comprehensive levels and quality descriptions, comparing the comprehensive levels with reference scores, calculating average absolute errors, and adjusting key parameters to optimize evaluation results if the errors exceed an error threshold, wherein the key parameters comprise a high-sensitivity region fluency judging threshold T2 and a color difference threshold T5.
2. The method for evaluating and analyzing video dynamic quality based on perception and memory according to claim 1, wherein the method is characterized by calculating the average value of standard time intervals of adjacent frames, setting an abnormal threshold value to be 1.5 times of the average value, traversing time interval data of all frames, marking the frames exceeding the abnormal threshold value as abnormal frames and eliminating the abnormal frames; The method comprises the steps of calculating pixel gray level variation delta G of adjacent frames by adopting an inter-frame difference method, setting a variation threshold T1, marking a pixel set with delta G exceeding T1 in a single frame as a dynamic area and marking the rest as a static area, tracking the motion trail of feature points in the dynamic area by using a light flow method, and recording the start coordinate, the end coordinate and the motion direction of each feature point to form a dynamic area motion feature table.
3. The video dynamic quality evaluation and analysis method based on perception and memory according to claim 2 is characterized by taking the center of a single frame picture as an origin, dividing the picture into a central area and an edge area, wherein the central area is a circular area with the radius equal to 1/3 of the diagonal length of the picture, the rest is the edge area, extracting foreground elements in a static area by adopting a semantic segmentation model, dividing the union of the dynamic area and the central static area into high-sensitivity areas by combining dynamic and static area marking results, dividing the foreground elements in the static area into medium-sensitivity areas, dividing the edge static area and a solid background area into low-sensitivity areas, distributing unique identifiers for each sensitivity area, and recording coordinates and area occupation ratios of the sensitivity areas.
4. The method for evaluating and analyzing video dynamic quality based on perception and memory according to claim 1, wherein the method is characterized in that smoothness judging thresholds T2, T3 and T4 are set for high, medium and low sensitivity areas respectively, abnormal times of pixel variation amplitude exceeding the smoothness judging thresholds in the same area of adjacent frames are counted, smoothness scores are calculated according to the abnormal times, and the abnormal times are accumulated and the smoothness scores are output by taking each second as a counting period.
5. The method for evaluating and analyzing video dynamic quality based on perception and memory according to claim 4, wherein edge detection algorithm is adopted to extract edge contours of each sensitivity area, different detection key points are set according to sensitivity levels, the ratio of the actual detected edge length to the ideal edge length is calculated as contour integrity, definition scores are calculated according to the contour integrity, and definition scores are output.
6. The method for evaluating and analyzing video dynamic quality based on perception and memory according to claim 5, wherein the method is characterized by traversing all pixels in a region, calculating R, G, B mean values, calculating color difference values delta E of the same region of adjacent frames, setting differentiated color difference thresholds T5, T6 and T7, recording 1 color mutation when delta E of each region exceeds the color difference threshold, tracking mutation duration frame numbers, calculating duration time, calculating color consistency scores according to mutation times and duration time, and outputting color consistency scores.
7. The video dynamic quality evaluation and analysis method based on perception and memory according to claim 1 is characterized in that the weight of each dimension is adjusted according to average scores of initial anchor points and cumulative times of abnormal events in a long-term memory pool, the smoothness weight is increased by 0.1 if the cumulative times of the cartoon events are greater than or equal to a time threshold value, the color consistency weight is increased by 0.1 if the cumulative times of the color mutation events are greater than or equal to a time threshold value, the sharpness weight is increased by 0.1 if the cumulative times of the fuzzy events are greater than or equal to a time threshold value, and the correction score of a current frame is recalculated according to the adjusted weight.

Description

Video dynamic quality evaluation and analysis method based on perception and memory Technical Field The invention relates to the technical field of video quality analysis, in particular to a video dynamic quality evaluation and analysis method based on perception and memory. Background Currently, the video quality evaluation technology is mainly divided into two types, namely objective evaluation and subjective evaluation, wherein the objective evaluation method directly analyzes pixel-level characteristics of a video, such as peak signal-to-noise ratio PSNR, structural similarity SSIM, video multi-method evaluation fusion VMAF and the like, and quantifies video quality from the dimensions of definition, smoothness, color fidelity and the like through a mathematical model and an algorithm, the subjective evaluation relies on manual grading, and an observer classifies the video quality according to viewing experience, such as a 5-point evaluation method in the ITU-R BT.500 standard, partially researches and tries to combine visual characteristics of human eyes, and optimizes accuracy of evaluation results through simulating sensitivity differences of human eyes to dynamic areas and static areas. However, the existing objective evaluation method has obvious limitations, on one hand, the traditional index such as PSNR is only calculated based on pixel difference, the quality perception of human eyes on dynamic content cannot be effectively reflected, especially misjudgment is easy to generate in complex motion scenes, on the other hand, the existing method adopts fixed weight or static model, dynamic changes of different video contents are difficult to adapt to, so that the deviation between an evaluation result and actual impression is large, in addition, most of the technologies lack continuous tracking capability on abnormal events, the evaluation strategy cannot be dynamically adjusted through historical data, the robustness of the model is limited, the subjective evaluation can reflect real impression, but the cost is high and the efficiency is low, the result is easy to be interfered by individual difference and environmental factors, and the existing method combining perception characteristics tries to simulate human eye sensitivity, but does not fully consider the influence of memory effect on quality evaluation, such as the accumulated perception of human eyes on continuous abnormal events or the anchoring effect on initial high-quality images. Disclosure of Invention (One) solving the technical problems Aiming at the defects of the prior art, the invention provides a video dynamic quality evaluation and analysis method based on perception and memory, which solves the problems that the traditional objective evaluation method can not accurately reflect human eye perception, lacks dynamic adaptability and ignores the influence of historical quality data by simulating the sensitivity difference of human eyes to dynamic and static areas and combining with the dynamic adjustment of evaluation weights, thereby remarkably improving the accuracy and robustness of video quality evaluation. (II) technical scheme In order to achieve the purpose, the invention is realized by the following technical scheme that the video dynamic quality evaluation and analysis method based on perception and memory comprises the following steps: Decomposing the video to be evaluated into a continuous frame sequence, removing abnormal frames, dividing dynamic and static areas, dividing a single frame into three types of high, medium and low sensitivity areas based on sensitivity, and recording area characteristics; Calculating the fluency score, definition score and color consistency score of each sensitivity region, and weighting to obtain a single-frame real-time perception quality total score; Establishing an initial mass anchor point and storing the initial mass anchor point into a long-term memory pool, dynamically updating mass data through an instant memory pool and a short-term memory pool, and calculating a weighted average score and the accumulated times of abnormal events; And dynamically adjusting the weight according to the memory pool data, correcting the current frame score, outputting a three-level nine-gear evaluation result, and optimizing key parameters after comparing the three-level nine-gear evaluation result with the reference score. Further, calculating the average value of standard time intervals of adjacent frames, setting an abnormal threshold value to be 1.5 times of the average value, traversing time interval data of all frames, marking the frames exceeding the abnormal threshold value as abnormal frames and eliminating the abnormal frames, reordering the rest frames according to time stamps, and recording the position information of the missing frames if the frame sequence numbers are discontinuous; The method comprises the steps of calculating pixel gray level variation delta G of adjacent frames by ad