CN-121980010-A - False information detection method based on LLM dynamic characterization and semantic syntax double-channel fusion

CN121980010ACN 121980010 ACN121980010 ACN 121980010ACN-121980010-A

Abstract

The invention relates to the technical field of natural language processing, discloses a false information detection method based on LLM dynamic characterization and semantic syntax double-channel fusion, and mainly solves the problems of inaccurate false information semantic recognition and poor judgment precision caused by logic modeling missing. The technical scheme is characterized in that LLM is utilized to extract dynamic feature vectors, a dual-channel iso-graph containing sequence edges and syntax dependency edges is constructed by combining dependency syntax analysis, a graph attention network is adopted to evolve features along a syntax path, logic conflict significance is calculated to identify logic distortion points, global topological features are aggregated through hierarchical multi-scale pooling, a logic path is traced back by combining reinforcement learning agents, and a judging result and an evidence chain are output. The invention obviously improves the detection accuracy and the interpretability, and is suitable for the fields of public opinion monitoring, news fact checking and the like.

Inventors

Jiao Tianshuo
HU QIAO

Assignees

湖南大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (8)

1. A false information detection method based on LLM dynamic characterization and semantic syntax double-channel fusion is characterized by comprising the following steps: extracting hidden layer states of a text sequence through a pre-training Large Language Model (LLM) to generate a dynamic initial feature vector with context sensitivity; step 2, constructing a semantic-syntactic two-channel diagram, namely extracting lexical sequence relation and deep syntactic dependency relation of a text, and constructing a two-channel heterogram containing sequence edges and syntactic edges; mapping and calibrating the dynamic characteristics, namely mapping an initial characteristic vector to a graph node, and performing redundancy filtering and dimension calibration on the node initial value by using an attention mechanism; Step 4, syntactically guided feature evolution, namely carrying out multi-layer feature propagation along a syntactic dependency path by adopting a Graph Neural Network (GNN) operator, and capturing cross-length logic constraint features; step 5, measuring significance of logic conflict, namely quantifying mismatching degree between vocabulary use and grammar logic in a text by calculating the difference of semantic flow among nodes; step 6, layering feature pooling, namely layering aggregation is carried out on the graph structure by combining node centrality and attention weight, and global space-time representation of the full text is generated; and 7, predicting and explaining the offence risk, namely inputting the global characterization into a discriminator, outputting false information probability scores, and identifying false logic vulnerability points based on the feature contribution degree.
2. The method according to claim 1, wherein the step 1 specifically comprises: step 1.1, performing full-attention calculation on an input text by using an encoder of a transducer architecture, and capturing deep semantics of a vocabulary under a specific context; and 1.2, extracting a last hidden layer vector as a dynamic Embedding to solve the problem of semantic drift caused by homonyms in false information.
3. The method according to claim 2, wherein the step 2 specifically comprises: Step 2.1, identifying the dependency relationship in the text by adopting a syntax analysis tool, and constructing a syntax tree reflecting a logical skeleton; And 2.2, establishing an iso-composition in the DGL framework by taking the word as a node, defining a sequence connecting edge to reserve the word sequence, and defining a dependency connecting edge to reserve the logic structure.
4. A method according to claim 3, wherein said step 3 comprises: Step 3.1, designing a dynamic screening operator based on variance driving, and calculating the mean value and variance of each dimension of the feature vector; And 3.2, weighting the original LLM vector by adopting a Sigmoid gating mechanism, and suppressing background noise irrelevant to false features.
5. The method according to claim 4, wherein the step 4 specifically includes: step 4.1, performing iterative aggregation on a two-channel graph by using a graph annotation force network (GAT), and dynamically distributing transfer weights of different syntactic relations; and 4.2, reserving original semantics through residual connection, and realizing the deep fusion of 'semantics-structure' in multiple iterations.
6. The method according to claim 5, wherein the step 5 specifically comprises: Step 5.1, constructing a logic consistency scoring matrix, and comparing feature differences of nodes under a sequence neighborhood and a syntax neighborhood; step 5.2, identifying logical warping points, i.e. node combinations with extremely uncoordinated semantics on the syntactic skeleton.
7. The method according to claim 6, wherein the step 6 specifically includes: Step 6.1, calculating syntactic centrality weight, namely calculating the degree centrality and the betweenness centrality of each node in a logic chain according to the topological structure of the syntactic dependency graph, and generating an initial structure importance score; Step 6.2, hierarchical attention aggregation, namely imitating the hierarchical relation of a syntax tree, converging leaf node characteristics to core predicate nodes by utilizing an adaptive graph enhancement operator, and constructing multi-scale sub-graph characterization; Step 6.3, mapping a global relation matrix, namely generating a cross-channel weight matrix by using a global attention mechanism, and implementing dynamic weight calibration on key logic nodes (such as a subject and a core verb); And 6.4, spatial-semantic joint pooling, namely compressing the full graph features by combining maximum pooling (Max-pooling) with attention aggregation, eliminating redundant noise and extracting a high-dimensional global feature vector reflecting the logical consistency of the full graph.
8. The method according to claim 7, wherein the step 7 specifically includes: Step 7.1, generating a final sentence embedded vector by using a cross attention mechanism by adopting a global weighted aggregation algorithm; And 7.2, backtracking the key logic path by combining with a reinforcement learning Agent (Agent) to output false discrimination results and corresponding evidence paths.

Description

False information detection method based on LLM dynamic characterization and semantic syntax double-channel fusion Technical Field The embodiment of the disclosure relates to the technical field of natural language processing, in particular to a false information detection method based on LLM dynamic characterization and semantic syntax dual-channel fusion. Background With the rapid iteration of internet technology and the wide popularization of social media platforms, the propagation of false information (FakeNews) presents remarkable characteristics of explosive growth, complex propagation path, strong concealment and the like. The false speech, rumor or misleading news which is carefully camouflaged not only seriously infringes the legal rights and interests of individuals, but also forms a serious threat to social beliefs, financial market order and even national public security. Therefore, constructing a set of automatic detection system capable of accurately and efficiently identifying false information becomes a key technical problem to be overcome in the field of current artificial intelligence and network security management. However, existing methods of false information detection based on deep learning still face many challenges in dealing with high-level logic masquerading. First, conventional techniques rely heavily on static word vectors or simple sequence coding, whose feature extraction process lacks awareness of context dynamic semantics. In false information, cook up a story and spread it around often skew facts by "word ambiguities" or subtle "semantic drifts," static characterizations make it difficult to capture semantic distortions of the vocabulary in a specific forgery context. Second, existing models often ignore the deep structural logic of the language, i.e., syntactic dependencies. False information is often very confusing in local words, but there is often a break in the causal chain in syntactic skeletons that span large. Reference mentions that in complex predictive tasks, it is difficult to capture causal links between key elements by text-sequence encoding alone. Although the traditional cyclic neural network or the converter model can process the word order, core logic constraints such as 'main predicate' and the like are difficult to model explicitly, so that the recognition rate is low when the traditional cyclic neural network or the converter model faces deception means of covering a true phase by a complex long difficult sentence. In addition, the prior art often lacks pertinence in a feature aggregation stage, and core logic nodes and redundant background noise cannot be effectively distinguished, so that key false evidence is extremely easy to dilute by massive information. Aiming at the technical bottleneck, the invention provides a detection scheme based on LLM dynamic characterization and semantic syntax double-channel fusion. The invention generates context-sensitive dynamic feature vectors for each vocabulary by utilizing the deep coding capability of the large language model, and ensures that the model can accurately identify subtle variations of word senses under false contexts. On the basis, the invention constructs a 'semantic-syntactic' double-channel topological graph structure, and deeply couples the sequence among vocabularies with an explicit syntax dependency path, so that the graph neural network can perform characteristic evolution and propagation in a topological space with logic constraint. By combining ideas about multi-hop reasoning and hierarchical attention mechanisms in the reference file, the scheme can accurately lock the logic loopholes in the text through multi-scale feature pooling and logic consistency verification. The technical path not only remarkably improves the accuracy of distinguishing false information, but also provides structural explanation basis for identifying false modes, and has extremely high application value. Disclosure of Invention In view of the above, the embodiments of the present disclosure provide a false information detection method based on LLM dynamic characterization and dual-channel fusion of semantic syntax, which at least partially solves the problems of inaccurate false information semantic recognition and poor judgment efficiency and accuracy caused by missing logic structure modeling in the prior art. The embodiment of the disclosure provides a false information detection method based on LLM dynamic characterization and semantic syntax dual-channel fusion, which comprises the following steps: Step 1, context-aware semantic coding; Step 2, constructing a semantic-syntactic double-channel diagram; step3, dynamic feature mapping and calibration; step4, syntactic guided feature evolution; step 5, measuring the significance of the logic conflict; Step 6, layering feature pooling; and 7, predicting and explaining the risk of the violation. According to a specific implementation manner of the embodiment of the