CN-120853078-B - Deep fake content identification method based on deep learning

CN120853078BCN 120853078 BCN120853078 BCN 120853078BCN-120853078-B

Abstract

The invention relates to the technical field of data processing, in particular to a deep fake content identification method based on deep learning, which comprises the following steps of collecting data; extracting data, determining a first temporary frame, determining a second temporary frame, judging a temporary suspicious frame, judging an abnormal video frame, adjusting a matching degree threshold value, judging a fake video and early warning. The method comprises the steps of collecting voice, mouth image and key point coordinates of a target in a news video, extracting audio, maximum mouth opening degree, pixel values and mouth shape feature vectors by using a deep learning model, finally determining an abnormal frame by combining a preset matching degree threshold value, coordinate difference, maximum mouth opening degree and pixel values, and accurately judging falsification and early warning by dynamically adjusting the threshold value.

Inventors

XIAO HONGTAO
FAN CHUANQING
WANG LILI
CHEN JUN
LI MIAN

Assignees

天津市国瑞数码安全系统股份有限公司

Dates

Publication Date: 20260512
Application Date: 20250703
Priority Date: 20250327

Claims (8)

1. The deep fake content identification method based on deep learning is characterized by comprising the following steps: Acquiring voice and mouth images of targets and coordinates of a plurality of key points in frames to be identified of news video acquired in a preset acquisition period; Extracting feature vectors of the voice by using a preset deep learning model to form audio feature vectors, and extracting maximum opening degree, pixel values and mouth shape feature vectors of the mouth image by using the preset deep learning model; Determining a plurality of first temporary frames according to the audio feature vector, the mouth shape feature vector and a preset matching degree threshold; Determining a plurality of second temporary frames according to the coordinates of each first temporary frame and the frames to be identified next to the first temporary frame; determining a plurality of temporary suspicious frames according to the maximum opening degree and the pixel value of the second temporary frame with continuous preset frames; Determining a plurality of abnormal video frames according to the audio feature vectors and the maximum openness of the temporary suspicious frames and all the frames to be identified; Adjusting the preset matching degree threshold according to the time stamp of each abnormal video frame to form an adjusted matching degree threshold; sending out early warning based on the number of the abnormal video frames determined by the adjustment matching degree threshold; Determining a plurality of first temporary frames according to the audio feature vector, the mouth shape feature vector and a preset matching degree threshold value, wherein the method comprises the following steps: calculating Euclidean distance between the audio feature vector and the mouth shape feature vector to form matching degree; When the matching degree is smaller than the preset matching degree threshold value, judging the frame to be identified as a first temporary frame so as to determine a plurality of first temporary frames; determining a plurality of second temporary frames according to the coordinates of each first temporary frame and the frame to be identified next to the first temporary frame, including: calculating an inverse cosine function of the coordinates to form a coordinate included angle; calculating the difference value of the coordinate included angle of each first temporary frame and the coordinate included angle of the frame to be identified to form an included angle change value; and judging the first temporary frame to be a second temporary frame according to the included angle change value and a preset included angle change threshold value so as to determine a plurality of second temporary frames.
2. The depth-learning-based depth falsification content recognition method of claim 1, wherein determining the first temporary frame as the second temporary frame according to the angle change value and a preset angle change threshold value to determine a plurality of second temporary frames comprises: And when the included angle change value is larger than the preset included angle change threshold value, judging the first temporary frame as a second temporary frame so as to determine a plurality of second temporary frames.
3. The depth-learning-based depth falsification content recognition method of claim 2, wherein determining a number of temporary suspicious frames according to the maximum tenseness and the pixel value of the second temporary frame of consecutive preset frames number comprises: calculating the standard deviation of the maximum opening degree to form a maximum opening degree fluctuation value; Calculating the standard deviation of the pixel value to form a pixel value fluctuation value; and determining a plurality of temporary suspicious frames according to the maximum opening degree fluctuation value and the pixel value fluctuation value.
4. A depth-learning-based depth counterfeit content identification method according to claim 3, wherein determining a number of temporary suspicious frames from the maximum opening degree fluctuation value and the pixel value fluctuation value comprises: Drawing a change curve of all maximum opening degree fluctuation values in the second temporary frame of the preset frame number to form a maximum opening degree change curve; drawing a change curve of all the pixel value fluctuation values in the second temporary frame of the preset frame number to form a pixel value change curve; calculating cosine similarity of the maximum opening degree change curve and the pixel value change curve to form a change synchronization degree; And when the change synchronization degree is larger than a preset synchronization degree threshold value, judging the second temporary frame as a temporary suspicious frame so as to determine a plurality of temporary suspicious frames.
5. The depth-learning-based depth counterfeit content identification method of claim 4, wherein determining a number of abnormal video frames from said audio feature vectors and said maximum dictation of said temporary suspicious frames and all of said frames to be identified comprises: Identifying each frame to be identified, which has the same audio feature vector as each temporary suspicious frame, to form a plurality of comparison identification frames; Calculating the relative deviation of the maximum opening degree of each temporary suspicious frame and the maximum opening degree of each contrast identification frame to form a first opening and closing deviation; when the first opening and closing deviation is smaller than a preset first deviation threshold, judging that the temporary suspicious frame and the frame to be identified are abnormal video frames; When the first opening and closing deviation is larger than or equal to a preset first deviation threshold, calculating the deviation between the maximum opening degree of each abnormal video frame and the maximum opening degree of a frame to be identified next to the abnormal video frame to form a second opening and closing deviation; when the second opening and closing deviation is smaller than a preset second deviation threshold value, judging that the temporary suspicious frame is an abnormal video frame and the frames to be identified next to the abnormal video frame are all abnormal video frames; and when the second opening and closing deviation is larger than or equal to a preset second deviation threshold value, judging that the temporary suspicious frame is an abnormal video frame.
6. The depth-learning-based depth falsification content recognition method of claim 5, wherein adjusting the preset matching degree threshold according to the time stamp of each of the abnormal video frames to form an adjusted matching degree threshold comprises: collecting time stamps of all the abnormal video frames to form a one-dimensional data set; clustering all the timestamps, and calculating the number of the timestamps contained in each cluster to form a cluster value; and adjusting the preset matching degree threshold according to the cluster value and the preset cluster value threshold to form an adjusted matching degree threshold.
7. The deep learning-based deep forgery content identification method of claim 6, wherein adjusting the preset matching degree threshold according to the cluster value and a preset cluster value threshold to form an adjusted matching degree threshold comprises: When the cluster value is larger than the preset cluster value threshold, calculating the relative deviation between the cluster value and the preset cluster value threshold to form cluster value deviation; and when the cluster value deviation is larger than a preset cluster value deviation threshold, reducing the preset matching degree threshold according to the cluster value deviation and a preset adjustment coefficient to form an adjustment matching degree threshold.
8. The depth-learning-based depth falsification content recognition method of claim 7, wherein the number of abnormal video frames determined based on the adjusted matching degree threshold gives an early warning, comprising: calculating the ratio of the number of the abnormal video frames to the number of all frames to be identified to form a total duty ratio; And when the total duty ratio is larger than a preset duty ratio threshold, judging that the news video is a fake video, and sending out early warning.

Description

Deep fake content identification method based on deep learning Technical Field The invention relates to the technical field of data processing, in particular to a deep fake content identification method based on deep learning. Background In the information age today, video content has become an important carrier for information dissemination, especially in the fields of news, social media, and monitoring. However, the rapid development of video forgery and tampering techniques makes the propagation of false information more hidden and difficult to identify. Therefore, how to effectively detect and prevent the falsification of video becomes an urgent need for maintaining the authenticity of information and social stability. Patent document with publication number CN113158818a discloses a method, device and equipment for identifying fake video, the method comprises the steps of obtaining a video to be identified, obtaining an image set corresponding to the video to be identified, wherein the image set comprises a plurality of images, obtaining texture features and optical flow similarity of the image set, and identifying whether the video to be identified is fake video according to the texture features and the optical flow similarity. Therefore, the method for identifying the fake video has the following problems that texture feature extraction is possibly inaccurate due to illumination change and complex background, optical flow similarity calculation has extremely high precision requirement on motion estimation, tiny errors are easy to misjudge, local tampering detection capability of a video frame is weak and is difficult to capture tiny fake marks, data imbalance problems are prominent, model training is difficult due to the fact that abnormal samples are scarce, generalization capability is limited, abnormal detection is carried out according to a single mode, evaluation is limited only according to the texture features and the optical flow similarity of an image, detection results are not accurate enough, and high complexity of optical flow calculation enables the method to be low in efficiency when processing high-resolution video or real-time detection, and quick response requirements in practical application are difficult to meet. Disclosure of Invention Therefore, the invention provides a deep fake content identification method based on deep learning, which is used for overcoming the problem of low fake identification accuracy caused by too relying on single image data through multi-mode feature analysis and dynamic threshold adjustment. In order to achieve the above object, the present invention provides a deep-forgery content identification method based on deep learning, comprising: Acquiring voice and mouth images of targets and coordinates of a plurality of key points in frames to be identified of news video acquired in a preset acquisition period; Extracting feature vectors of the voice by using a preset deep learning model to form audio feature vectors, and extracting maximum opening degree, pixel values and mouth shape feature vectors of the mouth image by using the preset deep learning model; Determining a plurality of first temporary frames according to the audio feature vector, the mouth shape feature vector and a preset matching degree threshold; Determining a plurality of second temporary frames according to the coordinates of each first temporary frame and the frames to be identified next to the first temporary frame; determining a plurality of temporary suspicious frames according to the maximum opening degree and the pixel value of the second temporary frame with continuous preset frames; Determining a plurality of abnormal video frames according to the audio feature vectors and the maximum openness of the temporary suspicious frames and all the frames to be identified; Adjusting the preset matching degree threshold according to the time stamp of each abnormal video frame to form an adjusted matching degree threshold; And sending out early warning based on the number of the abnormal video frames determined by the adjustment matching degree threshold value. Further, determining a plurality of first temporary frames according to the audio feature vector, the mouth shape feature vector and a preset matching degree threshold value comprises: calculating Euclidean distance between the audio feature vector and the mouth shape feature vector to form matching degree; and when the matching degree is smaller than the preset matching degree threshold value, judging the frame to be identified as a first temporary frame so as to determine a plurality of first temporary frames. Further, determining a plurality of second temporary frames according to the coordinates of each first temporary frame and a frame to be identified next to the first temporary frame, including: calculating an inverse cosine function of the coordinates to form a coordinate included angle; calculating the difference value of t