CN-121997098-A - Cross-modal consistency verification-based false information intelligent detection and tracing method

CN121997098ACN 121997098 ACN121997098 ACN 121997098ACN-121997098-A

Abstract

The invention discloses a false information intelligent detection and tracing method based on cross-modal consistency verification, which specifically comprises the following steps of S1, multi-modal content feature extraction, S2, cross-modal consistency verification, S3, deep forgery detection, S4, propagation anomaly detection, S5, knowledge enhancement facts checking, S6, information tracing and variation tracking, S7, comprehensive authenticity assessment, S8, model training strategies, and joint optimization of module parameters by adopting a multi-task learning framework. According to the invention, a multidimensional consistency verification system is constructed by fusing multi-source information such as video content, text semantics, a propagation network, an external knowledge base and the like, so that automatic identification, authenticity assessment and information tracing of various false information are realized.

Inventors

LI YIPENG
ZHAN LINGLI
MA CHAO
ZHOU HONGWEI

Assignees

江苏润和软件股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260126

Claims (10)

1. The false information intelligent detection and tracing method based on cross-modal consistency verification is characterized by comprising the following steps of: s1, respectively designing special feature extractors aiming at three modes of video, image and text, extracting semantic features and structural features of a content layer, and realizing multi-mode content feature extraction; S2, designing a cross-mode semantic alignment and consistency measurement mechanism based on the extracted multi-mode content characteristics, wherein the mechanism is used for detecting consistency of cross-mode combination of video content and text description and image and title; S3, based on the extracted video content characteristics, implementing deep counterfeiting detection, detecting the counterfeiting trace of the technical layer, analyzing the logic consistency of the semantic layer, constructing a multi-dimensional counterfeiting evidence chain, and forming complementation with cross-mode consistency verification to jointly identify various false information of the content layer; s4, after the detection of the content layer is completed, further analyzing the structural characteristics and the time sequence mode of the information propagation network, and identifying abnormal propagation behaviors; s5, based on the extracted text semantic features and the original text content, carrying out knowledge enhancement fact verification, aligning the extracted entities, events and relations with an external knowledge base, verifying the fact statement, and forming a multi-dimensional verification system with content detection and propagation analysis; S6, based on the extracted video features, information tracing and mutation tracking are implemented, and information sources are traced and mutation processes are traced through content fingerprint, similarity matching and propagation tree analysis, so that tracing evidence is provided for false information judgment; s7, integrating the outputs, performing comprehensive authenticity assessment, performing normalization processing on physical consistency loss and propagation abnormality indexes, and obtaining final scores through weighted fusion.
2. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 1, wherein the step S1 specifically comprises: s11, extracting video content characteristics For the input video, a time sequence convolutional neural network is combined with a visual transducer to extract multi-scale space-time characteristics, and the input video is set as Wherein Represent the first The frame image is displayed in a frame image, For the total number of frames of the video, And Image height and width, respectively; Firstly, extracting the spatial characteristics of each frame by using a visual encoder, wherein the visual encoder is realized by adopting a deep convolutional neural network, and deep semantic characteristics of an image are extracted through multi-layer convolution and pooling operation: Wherein the method comprises the steps of The method comprises the steps of capturing time sequence dynamic characteristics of video, extracting the time sequence characteristics by adopting a time sequence convolution network, carrying out convolution operation on the time dimension by adopting a 3D convolution kernel by adopting the time sequence convolution network, fusing spatial characteristics of adjacent frames, and capturing time sequence changes of actions and scenes, wherein the time sequence dynamic characteristics are the visual characteristic dimensions: Wherein the method comprises the steps of For the window size of the time series convolution, The method comprises the steps of merging feature vectors of time sequence information, finally, aggregating global video features through an attention mechanism, wherein the attention mechanism adopts a soft attention method, and adaptively aggregates features of all frames by learning importance weights of each frame: Wherein the method comprises the steps of As a matrix of weights that can be learned, In order to pay attention to the weight vector, Is the first The attention score of the frame is given to, The video content feature vector is aggregated; S12, extracting text semantic features For the input text, extracting deep semantic features by using a pre-training language model, setting the word sequence of the input text obtained after word segmentation Wherein Is the word number; The text encoder is realized by adopting a pre-training language model of a transducer architecture, and an embedded vector containing context information is generated for each word through a self-attention mechanism and position coding: Wherein the method comprises the steps of Is the first The context of the individual word is embedded into the vector, For the dimension of the text feature, Averaging the embedded vectors of all words by the averaging pooling operation to obtain global semantic representation of the whole text; S13, extracting image content characteristics For still images, spatial features are extracted using the same visual encoder as the video: Wherein the method comprises the steps of In order to input an image of the subject, Is an image content feature vector.
3. The method for intelligently detecting and tracing false information based on cross-modal consistency verification according to claim 1, wherein the step S2 specifically comprises: S21, cross-modal semantic alignment For video-text pairs, respectively projecting video features and text features into a common semantic space: Wherein the method comprises the steps of And As a matrix of projections that can be learned, In order to unify the dimensions of the semantic space, The projected feature vector; s22, consistency measurement Calculating the similarity of the cross-modal features in the semantic space as a consistency measurement index: Wherein the method comprises the steps of The L2 norm of the vector is represented, Representing a cross-modal consistency score, a larger value representing a more consistent video content with a textual description when Below a preset threshold When the data is judged to have cross-modal inconsistency, the data may be false information.
4. The method for intelligently detecting and tracing false information based on cross-modal consistency verification according to claim 1, wherein the step S3 specifically comprises: s31, technical layer forgery detection Detecting technology trace of depth forgery by adopting frequency domain analysis and depth neural network, and processing video frame Extracting frequency domain features: Wherein the method comprises the steps of The fast fourier transform is represented by a set of coefficients, The method comprises the steps of obtaining frequency domain representation, detecting frequency domain abnormality through a trained classifier, realizing the classifier by adopting a deep convolutional neural network, inputting the spliced frequency domain characteristics and spatial characteristics into the classifier, and outputting counterfeiting probability through a plurality of convolution layers and full connection layers: Wherein the method comprises the steps of For depth forgery classifier by characterizing the frequency domain And spatial features Feature fusion is carried out, then the multi-layer neural network processing is carried out, and finally the output is carried out through a Sigmoid activation function The probability of forging is the technical level; s32, semantic level logic verification Analyzing the physical consistency and time sequence consistency of video content, adopting a corresponding calculation method for each physical consistency check, wherein the illumination consistency is measured by analyzing the consistency of illumination intensity and direction of different areas in a video frame, the shadow consistency is measured by detecting whether the direction, length and shape of the shadow accord with the physical rule, the reflection consistency is measured by analyzing whether the reflection angle and intensity of a reflection surface accord with the optical principle, the action consistency is measured by analyzing the smoothness and continuity of actions between adjacent frames, the lip synchronization is measured by detecting the time alignment degree of voice signals and lip actions, and the physical consistency loss is defined: Wherein the method comprises the steps of For the physical consistency check item number, Is the first Weight coefficient of item check, satisfy , Is the first The inconsistency of a term measures a function, For inputting video, when Exceeding a preset threshold In this case, it is determined that there is physical inconsistency, and the content may be falsified.
5. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 1, wherein the step S4 specifically comprises: s41, propagation network construction Modeling information propagation processes as directed graphs Wherein For a set of nodes, Is an edge set, each node With feature vectors ; S42, abnormal propagation mode identification Analyzing the topological structure of the propagation network by adopting a graph neural network to identify an abnormal mode, realizing the graph neural network by adopting a graph rolling network GCN or a graph annotation force network GAT, aggregating the characteristics of neighbor nodes through a message transmission mechanism, learning the representation vector of the nodes, and then using the representation vector for anomaly detection Identifying the robot account number by analyzing the user characteristics and using a classifier, calculating the ratio of the number of the robot account number to the number of the general account number, and calculating the time concentration Calculating the proportion of the forwarding quantity in short time to the total forwarding quantity by counting the forwarding time distribution, and calculating the topology abnormality degree Detecting whether an abnormal mode such as a star structure, abnormal clustering and the like exists or not by analyzing the topological structure characteristics of the propagation network, calculating an abnormal score by adopting an abnormal detection algorithm, and defining a propagation abnormal index: Wherein the method comprises the steps of Is the proportion of the account number of the robot, In order for the time to be concentrated, In order to be a degree of topology anomaly, Is a weight coefficient, satisfies When (1) Exceeding a threshold value When it is determined that there is an abnormal propagation, possibly a human manipulation.
6. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 1, wherein the step S5 specifically comprises: s51, entity and relationship extraction Extracting entity set and relation set from text and video, using named entity to identify NER technique, using pre-trained sequence labeling model to identify name, place name, organization name, time and event entity, using relation extraction technique, using pre-trained relation classification model to identify relation between entities, using visual entity identification technique, using object detection and image classification model to identify object, scene and character visual entity in video, using video relation extraction technique, identifying relation between visual entities by analyzing space and time relation between entities in video frame sequence, and finally obtaining entity set Sum relation set Wherein In order to extract the number of entities, Is the number of relationships extracted; S52, knowledge base alignment and verification Extracting entity and relation and knowledge graph Alignment is carried out, and the fact credibility is calculated: Wherein the method comprises the steps of And Respectively an entity and a relation set in the knowledge graph, The size of the set is represented by the size of the set, Is a fact credibility score when Below the threshold, a determination is made that a physical error exists.
7. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 1, wherein the step S6 specifically comprises: s61, content fingerprint generation Generating robust content fingerprints for videos and texts, selecting representative key frames for the videos by adopting a key frame extraction algorithm, extracting spatial features of the key frames, generating visual fingerprints by adopting a perception hash or depth hash method, generating fingerprints robust to slight changes by mapping image features to binary codes with fixed length by the perception hash, mapping high-dimensional features to low-dimensional hash codes by training a depth neural network learning hash function by the depth hash, fusing the features of a plurality of key frames for the video fingerprints, and generating by the hash function: Wherein the method comprises the steps of For the key frame index (key frame index), As a function of the hash-up, Is a video fingerprint vector; s62, similarity matching and tracing Searching similar content in a historical database, rapidly searching similar fingerprints by adopting an approximate nearest neighbor searching algorithm, and then calculating accurate similarity, wherein the similarity calculation adopts a cosine similarity method to measure the included angle of two fingerprint vectors in a vector space: Wherein the method comprises the steps of In order to query the fingerprint vector of the content, Is a fingerprint vector of historical content in the database, The L2 norm of the vector is represented, The method comprises the steps of obtaining a similarity score, finding potential sources when the similarity exceeds a preset threshold value, determining earliest release time through propagation tree analysis, locating information sources, backtracking from a current node to all possible source nodes through construction of an information propagation directed graph through propagation tree analysis, comparing release time of each source node, selecting the node with earliest release time as the information source, traversing the propagation tree by adopting a depth-first search DFS (distributed feedback) or breadth-first search BFS algorithm, and determining the earliest release node by combining timestamp information.
8. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 1, wherein the step S7 specifically comprises: Wherein the method comprises the steps of For each output, the fusion weight coefficient satisfies , And The maximum normalization factors of the physical consistency loss and the propagation anomaly indexes, In order to cross-modality consistency scores, For the technical-level probability of falsification, In the event of a loss of physical consistency, In order to propagate the anomaly index, Is a fact confidence score; To synthesize the authenticity score, a larger value indicates that the information is more trusted when Below a preset threshold And judging false information.
9. The method for intelligently detecting and tracing false information based on cross-modal consistency verification according to claim 1 is characterized by further comprising a step S8 of model training strategy, wherein a multi-task learning framework is adopted to jointly optimize each parameter.
10. The method for intelligently detecting and tracing false information based on cross-modal consistency verification as claimed in claim 9, wherein the step S8 specifically includes: the total loss function is defined as: Wherein the method comprises the steps of In order to verify the loss of consistency, In order to detect the loss in the event of counterfeiting, In order to propagate the anomaly detection loss, The loss is checked for the fact that, In order to trace the source loss, the data is compared with the source loss, For each lost weight coefficient.

Description

Cross-modal consistency verification-based false information intelligent detection and tracing method Technical Field The invention relates to the technical fields of artificial intelligence, multimedia information processing and information security, in particular to a cross-modal consistency verification-based false information intelligent detection and tracing method, which is suitable for various application scenes such as social media platform content auditing, news media fact auditing, public opinion monitoring and early warning, judicial evidence obtaining assistance and the like. Background With the rapid growth of the internet and social media, the spread of false information has become a serious social problem. The false information not only comprises the traditional literal rumors, but also covers various forms such as deep fake video, a flowers and plants image, a broken octopus meaning content and the like. The false information has high propagation speed and wide influence range, and causes serious harm to individuals, enterprises and society. The current false information detection technology mainly faces the following problems: 1. The prior method is mainly used for only aiming at a single mode (such as only detecting text or only detecting images), and cannot effectively process multi-mode false information such as video-text combination, image-title combination and the like. For example, false information that is cross-modal inconsistent, such as real video with false titles, normal image with misleading text, etc., is difficult to identify by single-modality methods. 2. The depth forgery detection capability is insufficient, and with the development of technologies such as generation of countermeasure networks (GAN) and diffusion models, the quality of depth forgery video and image is continuously improved, and it is difficult for the conventional detection method to identify forgery content of high quality. The existing method mainly relies on artifact detection at the technical level and lacks logic consistency verification at the semantic level. 3. False information is often diffused through abnormal propagation modes, such as robot account batch forwarding, water army collaboration cook up a story and spread it around and the like. The existing method mainly focuses on the content, is insufficient in analysis of structural features and time sequence modes of a transmission network, and cannot identify the propagation behaviors of artificial manipulation. 4. The fact checking efficiency is low, the traditional fact checking relies on manual verification, the efficiency is low, the cost is high, and the real-time checking requirement of massive information is difficult to deal with. The existing automated fact checking method is mainly based on keyword matching and simple rules, and cannot understand complex semantics and context. 5. The information tracing is difficult, namely false information is frequently mutated in modes of editing, dubbing, changing titles and the like in the propagation process, and the original source is difficult to trace. The existing tracing method is mainly based on simple text similarity or image hash, and cannot process complex content variation and cross-platform propagation. 6. The multi-source information fusion is insufficient, namely false information detection needs to comprehensively utilize various information sources such as content features, propagation features, user features, external knowledge base and the like, but the existing method lacks an effective multi-source information fusion mechanism, and collaborative verification among the information sources. Therefore, an intelligent method for realizing automatic detection, authenticity assessment and information tracing of false information through cross-modal consistency verification and multidimensional analysis by integrating multi-modal content, a propagation network and external knowledge is needed. The invention solves the technical problems by constructing a cross-modal consistency verification model, a propagation anomaly detection model and a knowledge enhancement reasoning model, and provides an efficient and reliable technical scheme for false information management. Disclosure of Invention The invention provides a cross-modal consistency verification-based false information intelligent detection and tracing method, which is used for constructing a multi-dimensional consistency verification system by fusing multi-source information such as video content, text semantics, a propagation network, an external knowledge base and the like, and realizing automatic identification, authenticity assessment and information tracing of various false information. The specific scheme is as follows: A false information intelligent detection and tracing method based on cross-modal consistency verification specifically comprises the following steps: S1, extracting multi-mode content features, namely