CN-116561619-B - Audio and video auditing method, device, equipment and readable storage medium

CN116561619BCN 116561619 BCN116561619 BCN 116561619BCN-116561619-B

Abstract

The application discloses an audio and video auditing method, device, equipment and readable storage medium, which comprise the steps of acquiring audio and video data to be audited, determining an audio and video auditing model, wherein the audio and video auditing model comprises a primary analysis network, a secondary analysis network, a tertiary analysis network and a result processing network, the primary analysis network is used for carrying out large-class analysis screening according to various large-class illegal tags to obtain a primary analysis result, the secondary analysis network is used for carrying out fine-class analysis screening on various illegal audio and video data in various illegal audio and video data sets to obtain a secondary analysis result, the tertiary analysis network is used for extracting implicit characteristics of various illegal audio and video data, carrying out similarity calculation to obtain a similarity comparison result, and the result processing network is used for determining the auditing result of the audio and video data to be audited based on the primary analysis result, the secondary analysis result and the similarity comparison result.

Inventors

LI ZIJUN
ZHANG ZHENGTONG
HUANG XIANGKANG
LAN XIANG
Liao Yanbing
XU ZHIJIAN
XIE RUI
CHEN GUANGYAO
MA JINLONG
XIONG JIA
WU WENLIANG
ZENG RUIHONG
WANG WEIZHE
PAN ZISHENG
Jiao Nankai
DENG QICHUN

Assignees

广州趣研网络科技有限公司

Dates

Publication Date: 20260505
Application Date: 20230327

Claims (10)

1. An audio/video auditing method is characterized by comprising the following steps: Acquiring audio and video data to be audited; Determining an audio-video auditing model, wherein the audio-video auditing model comprises a primary analysis network, a secondary analysis network, a tertiary analysis network and a result processing network, the primary analysis network is used for carrying out large-class analysis screening on the input audio-video data to be checked according to various large-class violation tags to obtain a primary analysis result and generating various violation audio-video data sets corresponding to various large-class violation tags, the secondary analysis network is used for carrying out fine-class analysis screening on various violation audio-video data in the various violation audio-video data sets to obtain a secondary analysis result and determining subclass violation tags corresponding to the various violation audio-video data, the tertiary analysis network is used for extracting implicit characteristics of the various violation audio-video data, the method comprises the steps that a blacklist feature library corresponding to the subclass violation labels and features in a whitelist feature library are subjected to similarity calculation to obtain a similarity comparison result, a result processing network is used for determining an auditing result of audio and video data to be audited based on the primary analysis result, the secondary analysis result and the similarity comparison result, the primary analysis network consists of a progressive deep convolution network and a point-by-point convolution layer which adopt an inverted residual structure, the secondary analysis network consists of a residual convolution network, a pooling processing layer and a result output layer, and the tertiary analysis network consists of an image feature extraction layer, a voice feature extraction layer and a similarity comparison layer; Inputting the audio and video data to be audited into the audio and video auditing model to obtain auditing results of the audio and video data to be audited, which are output by the audio and video auditing model.
2. The method of claim 1, wherein determining an audit result for the audio-video data to be audited based on the primary analysis result, the secondary analysis result, and the similarity comparison result comprises: determining that the auditing result of each piece of audio and video data which is not violated in the primary analysis result and/or the secondary analysis result in the audio and video data to be audited is auditing-passed; determining that the primary analysis result and the secondary analysis result are illegal, and the similarity comparison result is an auditing result of each piece of audio and video data with feature similarity exceeding a preset threshold value in the blacklist feature library, wherein the auditing result is not approved; And determining that the primary analysis result and the secondary analysis result are illegal, and the similarity comparison result is an auditing result of each piece of audio and video data with the feature similarity exceeding a preset threshold value in the white list feature library, as auditing passing.
3. The method as recited in claim 2, further comprising: And sending the primary analysis result and the secondary analysis result to a manual auditing module, wherein the primary analysis result and the secondary analysis result are illegal, the similarity comparison result is obtained by sending all pieces of audio and video data, the feature similarity of which does not exceed a preset threshold value, in the blacklist feature library and the whitelist feature library to the manual auditing module, and determining the feedback result of the manual auditing module as an auditing result.
4. The method according to claim 1, wherein the process of the primary analysis network performing a primary analysis and screening on the input audio/video data to be checked according to each primary violation label to obtain a primary analysis result and generating each violation audio/video data set corresponding to each primary violation label includes: The depth-by-depth convolution network of the primary analysis network carries out depth-by-depth convolution on the input audio and video data to be checked according to parameters corresponding to various large-class violation tags, and lightweight filtering is used for obtaining information values of all channels; and the point-by-point convolution layer of the primary analysis network is constructed by linear combination based on the channel information values, and each offending audio and video data set corresponding to each large offending label is determined according to the screening range corresponding to each large offending label.
5. The method according to claim 1, wherein the process of performing, by the secondary analysis network, a fine class analysis screening on each piece of offending audio and video data in the offending audio and video data set to obtain a secondary analysis result and determining a subclass offending tag corresponding to each piece of offending audio and video data includes: The residual convolution network of the secondary analysis network obtains the characteristic information of each piece of illegal audio and video data in each illegal audio and video data set by carrying out characteristic extraction on each piece of illegal audio and video data in each illegal audio and video data set; The pooling processing layer of the secondary analysis network performs dimension reduction pooling processing on the characteristic information of each piece of illegal audio and video data in each illegal audio and video data set to generate the integrated characteristics of each piece of illegal audio and video data in each illegal audio and video data set; And the result output layer of the secondary analysis network performs secondary violation screening based on the integration characteristics of each piece of the violation audio-video data in each violation audio-video data set to obtain a secondary analysis result and determine subclass violation tags corresponding to each piece of the violation audio-video data.
6. The method of claim 1, wherein the process of extracting implicit features of each piece of offending audio and video data by the three-stage analysis network, and performing similarity calculation on features in a blacklist feature library and a whitelist feature library corresponding to the subclass offending labels to obtain a similarity comparison result includes: The image feature extraction layer of the three-stage analysis network extracts implicit features of video pictures in each piece of illegal audio and video data to obtain picture general features; the voice characteristic extraction layer of the three-stage analysis network extracts implicit characteristics of voice signals in each piece of illegal audio and video data to obtain voice general characteristics; and the similarity calculation layer of the three-level analysis network performs cosine similarity calculation on the features in the blacklist feature library and the whitelist feature library corresponding to the subclass violation labels based on the picture general features and the voice general features to obtain the similarity comparison result.
7. The method of claim 1, wherein the process of training the audio-visual auditing model comprises: acquiring training audio and video data, wherein the training audio and video data is marked with corresponding auditing result information; Inputting the training audio and video data into a preset initial audio and video auditing model to obtain auditing results of the training audio and video data output by the initial audio and video auditing model; training the initial audio/video auditing model by taking the auditing result of the training audio/video data and the corresponding auditing result information marked by the training audio/video data as targets; And when the initial audio/video auditing model meets the preset training conditions, taking the initial audio/video auditing model after training as the audio/video auditing model.
8. An audio/video auditing device, comprising: the audio and video acquisition module is used for acquiring audio and video data to be audited; The model determining module is used for determining an audio and video auditing model, the audio and video auditing model comprises a primary analysis network, a secondary analysis network, a tertiary analysis network and a result processing network, the primary analysis network is used for carrying out large-class analysis screening on the input audio and video data to be audited according to large-class violation tags to obtain primary analysis results and generate various violation audio and video data sets corresponding to the large-class violation tags, the secondary analysis network is used for carrying out fine-class analysis screening on various violation audio and video data in the various violation audio and video data sets to obtain secondary analysis results and determining subclass tags corresponding to the various violation audio and video data, the tertiary analysis network is used for extracting implicit features of the various violation audio and video data, carrying out similarity calculation on features in a blacklist feature library and a whitelist feature library corresponding to the subclass violation tags to obtain similarity comparison results, the result processing network is used for determining that the voice and video data to be audited are subjected to analysis results by a secondary analysis network and a convolution layer, and a convolution layer-by a comparison result is formed by a secondary analysis network, and a convolution layer-by a comparison result, and a comparison result is formed by a secondary analysis network; And the auditing result module is used for inputting the audio and video data to be audited into the audio and video auditing model to obtain the auditing result of the audio and video data to be audited, which is output by the audio and video auditing model.
9. An audio/video auditing device is characterized by comprising a memory and a processor; the memory is used for storing programs; the processor is configured to execute the program to implement the steps of the audio/video auditing method according to any one of claims 1-7.
10. A readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the audio video auditing method of any of claims 1-7.

Description

Audio and video auditing method, device, equipment and readable storage medium Technical Field The present application relates to the field of audit identification, and more particularly, to an audio/video audit method, apparatus, device, and readable storage medium. Background With the rapid development of the fields of entertaining social contact, video release, live broadcast and the like, the number of users and the number of audios and videos of an uploading platform are greatly increased, and the audio and video contents with different time ranges from tens of thousands to hundreds of thousands of hours are produced every day. At present, the management of the network environment is very strict, the auditing of the uploaded audio and video data is an important working content of each large platform, the platform is obligated to screen and audit the audio and video data uploaded by the user, whether the audio and video works belong to illegal, popular and prohibited uploading is judged, and the health of the video works uploaded by the user is ensured. In order to meet the supervision requirements, a platform management company is generally required to set an auditing department, videos uploaded by users in the background every day can be sent to the auditing department for cross auditing, and the auditing can be displayed and other users through the platform after the auditing is passed, but manual supervision is used, so that the system is huge in labor cost and low in auditing efficiency, cannot adapt to the situation that the quantity of audio and video data is large, and cannot meet the requirement of subsequent development of the platform. Disclosure of Invention In view of the above, the application provides an audio/video auditing method, device, equipment and readable storage medium, which are characterized in that an audio/video auditing model formed by three-level auditing networks is used for auditing audio/video data to be audited, so that the cost of an auditing machine is saved and better, and features in a blacklist feature library and a whitelist feature library corresponding to subclass illegal tags are innovatively used, so that the machine can automatically audit the audio/video data, the manual auditing pressure and cost are reduced, and the auditing efficiency is improved. An audio-video auditing method, comprising: Acquiring audio and video data to be audited; Determining an audio-video auditing model, wherein the audio-video auditing model comprises a primary analysis network, a secondary analysis network, a tertiary analysis network and a result processing network, the primary analysis network is used for carrying out large-class analysis screening on the input audio-video data to be audited according to various large-class violation tags to obtain primary analysis results and generating various illegal audio-video data sets corresponding to various large-class violation tags, the secondary analysis network is used for carrying out fine-class analysis screening on various illegal audio-video data in the various illegal audio-video data sets to obtain secondary analysis results and determining subclass violation tags corresponding to the various illegal audio-video data, the tertiary analysis network is used for extracting implicit features of the various illegal audio-video data and carrying out similarity calculation on features in a blacklist feature library and a white list feature library corresponding to the subclass violation tags to obtain similarity comparison results, and the result processing network is used for determining auditing results on the audio-video data to be audited based on the primary analysis results, the secondary analysis results and the similarity comparison results; Inputting the audio and video data to be audited into the audio and video auditing model to obtain auditing results of the audio and video data to be audited, which are output by the audio and video auditing model. Preferably, determining an auditing result of the audio/video data to be audited based on the primary analysis result, the secondary analysis result and the similarity comparison result includes: determining that the auditing result of each piece of audio and video data which is not violated in the primary analysis result and/or the secondary analysis result in the audio and video data to be audited is auditing-passed; determining that the primary analysis result and the secondary analysis result are illegal, and the similarity comparison result is an auditing result of each piece of audio and video data with feature similarity exceeding a preset threshold value in the blacklist feature library, wherein the auditing result is not approved; And determining that the primary analysis result and the secondary analysis result are illegal, and the similarity comparison result is an auditing result of each piece of audio and video data with the feature similarity exceeding a preset t