CN-121985152-A - Video quality determining method and server

CN121985152ACN 121985152 ACN121985152 ACN 121985152ACN-121985152-A

Abstract

The application relates to a video quality determining method and a server, and relates to the technical field of image processing. The method comprises the steps of obtaining multi-source data of a video to be processed, wherein the multi-source data comprises video description data, user behavior data and comprehensive evaluation data, the comprehensive evaluation data are determined according to the evaluation data of the video to be processed under different platforms, encoding the multi-source data to obtain video features corresponding to the video description data, behavior features corresponding to the user behavior data and evaluation features corresponding to the comprehensive evaluation data, performing association analysis on the video features, the behavior features and the evaluation features to obtain association information among different features, fusing the video features, the behavior features and the evaluation features according to the association information among different features to obtain fusion features, and analyzing the fusion features to obtain quality scores of the video to be processed. By adopting the technical scheme, the quality score can truly reflect the actual quality level of the video, and the accurate assessment of the video quality is realized.

Inventors

SHI XIAOLONG
RAO GANG
HUANG SHANSHAN
LIU XIN

Assignees

青岛聚看云科技有限公司
青岛市人力资源发展研究与促进中心

Dates

Publication Date: 20260505
Application Date: 20260206

Claims (10)

1. A method of video quality determination, the method comprising: the method comprises the steps of obtaining multi-source data of a video to be processed, wherein the multi-source data comprises video description data, user behavior data and comprehensive evaluation data, and the comprehensive evaluation data is determined according to the evaluation data of the video to be processed under different platforms; encoding the multi-source data to obtain video features corresponding to the video description data, behavior features corresponding to the user behavior data and evaluation features corresponding to the comprehensive evaluation data; performing association analysis on the video features, the behavior features and the evaluation features to obtain association information among different features; According to the association degree information among different features, fusing the video features, the behavior features and the evaluation features to obtain fusion features; and analyzing the fusion characteristics to obtain the quality score of the video to be processed.
2. The method according to claim 1, wherein the performing association analysis on the video feature, the behavior feature, and the evaluation feature to obtain association information between different features includes: splicing the video features, the behavior features and the evaluation features to obtain a feature sequence; Adopting three independent linear projection layers to respectively perform linear mapping on the characteristic sequences to obtain a query vector, a key vector and a value vector of the multi-head attention module; and capturing association degree information according to the query vector, the key vector and the value vector to obtain association degree information between different features, wherein the multi-head attention module comprises at least three attention heads, and each attention head is used for capturing the association degree information between two different features.
3. The method according to claim 1, wherein the fusing the video feature, the behavior feature and the evaluation feature according to the association degree information between different features to obtain a fused feature includes: Updating the video features, the behavior features and the evaluation features according to the relevance information among different features to obtain new video features, new behavior features and new evaluation features fused with the corresponding relevance information; And carrying out weighted fusion on the new video feature, the new behavior feature and the new evaluation feature according to the fusion weight corresponding to the new video feature, the fusion weight corresponding to the new behavior feature and the fusion weight corresponding to the new evaluation feature to obtain fusion features, wherein each fusion weight is used for representing the contribution degree of the corresponding feature to the fusion features.
4. A method according to any one of claims 1-3, wherein said obtaining multi-source data of the video to be processed comprises: acquiring a video to be processed, user log data aiming at the video to be processed and evaluation data of the video to be processed under different platforms; carrying out semantic analysis on the video to be processed to obtain video description data of the video to be processed; extracting features of the user log data to obtain user behavior data of the video to be processed; And carrying out fusion processing on the evaluation data of the video to be processed under different platforms to obtain comprehensive evaluation data.
5. The method of claim 4, wherein the video description data comprises video type, emotion data, and narrative rhythm data, and wherein the performing semantic analysis on the video to be processed to obtain video description data of the video to be processed comprises: segmenting the video to be processed to obtain at least two video segments; Performing feature extraction on each video segment to obtain multi-modal features of the video segment, wherein the multi-modal features comprise lens motion features, voice content features and semantic features; determining the video type and emotion data of the video to be processed according to the multi-mode characteristics of different video segments; And extracting the narrative structure of the video to be processed, and determining the narrative rhythm data of the video to be processed according to the narrative structure.
6. The method of claim 5, wherein the feature extraction of the video segment to obtain multi-modal features of the video segment comprises: acquiring image data, audio data and text data of the video segment; extracting features of the image data to obtain lens motion features of the video segment; converting the audio data into a Mel spectrogram, and extracting features of the Mel spectrogram to obtain the voice content features of the video segment; And performing word segmentation on the text data, and performing feature extraction on the text data after word segmentation to obtain semantic features of the video segment.
7. The method of claim 4, wherein the fusing the evaluation data of the video to be processed under different platforms to obtain comprehensive evaluation data includes: Determining deviation factors and platform weights of different platforms according to the platform attribute information of the different platforms; Correcting the evaluation data of the video to be processed under the platform according to the deviation factor of the platform aiming at any platform to obtain standardized evaluation data of the platform; And carrying out weighted fusion on the standardized evaluation data of different platforms according to the platform weights of the different platforms to obtain comprehensive evaluation data.
8. The method of claim 4, wherein the user behavior data includes behavior distribution characteristics, play rhythm characteristics, and interaction depth characteristics, and wherein the performing feature extraction on the user log sequence data to obtain the user behavior data of the video to be processed includes: Determining the corresponding behavior frequency of the first data according to the first data aiming at each first data in the user log data, wherein the different first data comprises at least two of click behavior data, interaction behavior data and transaction behavior data; Determining information entropy according to the behavior frequency corresponding to different first data, and determining behavior distribution characteristics according to the information entropy; Extracting features of playing behavior data in the user log data to obtain playing rhythm features; And extracting features of at least one of the interactive behavior data, the transaction behavior data and the feedback data in the user log data to obtain interaction depth features.
9. A method according to any one of claims 1-3, wherein said analyzing said fusion features to obtain a quality score of said video to be processed comprises: Analyzing the fusion characteristics to obtain dimension scores of the video to be processed under different evaluation dimensions, and Determining dynamic weights of different evaluation dimensions according to the video types of the video to be processed; and carrying out weighted fusion on the dimension scores of different dimensions according to the dynamic weights of different evaluation dimensions to obtain the quality score of the video to be processed.
10. A server for a server, which comprises a server and a server, characterized by comprising the following steps: At least one processor, and configured to: the method comprises the steps of obtaining multi-source data of a video to be processed, wherein the multi-source data comprises video description data, user behavior data and comprehensive evaluation data, and the comprehensive evaluation data is determined according to the evaluation data of the video to be processed under different platforms; encoding the multi-source data to obtain video features corresponding to the video description data, behavior features corresponding to the user behavior data and evaluation features corresponding to the comprehensive evaluation data; performing association analysis on the video features, the behavior features and the evaluation features to obtain association information among different features; According to the association degree information among different features, fusing the video features, the behavior features and the evaluation features to obtain fusion features; and analyzing the fusion characteristics to obtain the quality score of the video to be processed.

Description

Video quality determining method and server Technical Field The present application relates to the field of image processing technologies, and in particular, to a method and a server for determining video quality. Background With the rapid development of internet technology, multimedia coding technology and mobile terminal equipment, video has become one of the main carriers for people to acquire information, entertain social service and knowledge propagation, and is widely applied to various scenes. Meanwhile, how to accurately evaluate the quality of video content has become an important problem in links such as content distribution, recommendation, low-quality content control and the like. In the prior art, various video quality evaluation schemes have been proposed, for example, evaluation is performed based on objective characteristics of a video, intrinsic parameters of the video such as resolution, frame rate, code rate, signal to noise ratio, color saturation and the like are extracted, and quantization analysis is performed on the parameters by adopting a preset scoring model to obtain a video quality score, or evaluation is performed based on subjective evaluation fed back by a user, for example, feedback information of a user on single dimension such as praise, comment, score and the like of the video is collected, and a user feedback result is directly used as an evaluation basis of the video quality. However, the above prior art techniques have difficulty in achieving accurate, comprehensive assessment of video quality. Disclosure of Invention The application provides a video quality determining method and a server, which can lead the final quality score to be more accurate and fit with the actual, can truly reflect the actual quality level of the video, and realize the accurate and comprehensive assessment of the video quality. In a first aspect, some embodiments provide a video quality determining method, including: the method comprises the steps of obtaining multi-source data of a video to be processed, wherein the multi-source data comprises video description data, user behavior data and comprehensive evaluation data, and the comprehensive evaluation data is determined according to the evaluation data of the video to be processed under different platforms; encoding the multi-source data to obtain video features corresponding to the video description data, behavior features corresponding to the user behavior data and evaluation features corresponding to the comprehensive evaluation data; performing association analysis on the video features, the behavior features and the evaluation features to obtain association information among different features; According to the association degree information among different features, fusing the video features, the behavior features and the evaluation features to obtain fusion features; and analyzing the fusion characteristics to obtain the quality score of the video to be processed. In the embodiment, the multi-source data such as the video description data, the user behavior data and the comprehensive evaluation data of the video to be processed are integrated, the evaluation limitation of single data dimension is broken through, the comprehensive consideration of video quality is realized, the evaluation deviation caused by a data slice is avoided, further, the multi-dimensional characteristics are extracted through encoding and the association degree analysis is carried out, the internal association among different characteristics is fully mined, the characteristic fusion is more targeted, and the data utilization efficiency is improved. In addition, the quality score is obtained by fusing the multidimensional features based on the association degree information among different features, the objective attribute of the video and the subjective perception of the user are considered, the multi-platform evaluation result is integrated, the final quality score is more accurate and practical, the actual quality level of the video can be truly reflected, and the accurate and comprehensive evaluation of the video quality is realized. In a second aspect, some embodiments further provide a video quality determining apparatus, including: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring multi-source data of a video to be processed, and the multi-source data comprises video description data, user behavior data and comprehensive evaluation data, wherein the comprehensive evaluation data is determined according to the evaluation data of the video to be processed under different platforms; The data coding module is used for coding the multi-source data to obtain video characteristics corresponding to the video description data, behavior characteristics corresponding to the user behavior data and evaluation characteristics corresponding to the comprehensive