CN-122021652-A - Multi-mode image-text semantic conflict detection and correction method based on contrast learning
Abstract
The invention discloses a multi-mode image-text semantic conflict detection and correction method based on contrast learning, which relates to the technical field of data processing, and aims to explicitly model semantic contribution unbalance states originally hidden in a multi-mode contrast fusion process by introducing a mode suppression coupling recognition mechanism into a semantic fidelity breaking propagation chain, so that semantic analysis can not only depend on expansion and comparison of structural layers, but also can describe dominant degree and attenuation trend of semantic nodes of different sources in the fusion process from the aspect of semantic energy distribution and path conduction relation.
Inventors
- CHEN YING
- HONG XIAOMEI
- HUANG YOUJUN
- GAO LICHAO
- JIANG HONGBO
Assignees
- 厦门身份宝网络科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260414
Claims (9)
- 1. The multi-mode image-text semantic conflict detection and correction method based on contrast learning is characterized by comprising the following steps of: acquiring multi-mode image-text data, and executing data analysis based on a natural language processing technology driven by contrast learning to obtain a constructed semantic fidelity breaking propagation chain; Performing reconstruction on the semantic fidelity broken propagation chain through modal suppression coupling identification; based on the pseudo-semantic consistent segmentation discrimination and chain reinforcement of the reconstructed semantic fidelity broken propagation chain, obtaining an offset propagation sub-chain; Extracting features of the offset propagation sub-chain to obtain a time offset chain segment, and executing event consistency discrimination and chain structure update on semantic nodes to obtain a mapping classification result, wherein the mapping classification result comprises an asynchronous chain segment and a real conflict chain segment; If the semantic node is an asynchronous chain segment, extracting and correcting the information quantity of the semantic node, and identifying an editable domain after differential processing; based on the editable field, a semantic write-back operation is performed.
- 2. The contrast learning-based multi-modal teletext semantic conflict detection and correction method according to claim 1, wherein the steps of obtaining multi-modal teletext data, and performing data parsing based on a contrast learning-driven natural language processing technique to obtain a build semantic fidelity breaking propagation chain include: Based on a natural language processing technology, carrying out semantic decomposition on text description in multi-mode image-text data to obtain text nodes, carrying out structural analysis on non-text description in a structural semantic modeling and index coding mode to obtain structural nodes, and mapping the text nodes and the text nodes to the same semantic vector space of the structural nodes to obtain a combined semantic primitive set; Based on the joint semantic primitive set, semantic node pairs are identified through comparison and learning semantic matching, path expansion analysis is carried out by combining a connected component algorithm, a semantic association domain is constructed, and joint position coordinates are generated; based on the semantic association domain, identifying an energy anomaly domain and generating a fracture candidate region set through semantic energy distribution modeling and gradient change analysis; based on the fracture candidate region set, a semantic fidelity fracture propagation chain is constructed through path tracking and structural connection so as to represent contribution distribution, path conduction relation and conflict signal attenuation states of semantic nodes of different modes in a comparison fusion process.
- 3. The contrast learning-based multimodal teletext semantic conflict detection and correction method according to claim 2, wherein identifying an energy anomaly domain and generating a set of fracture candidate regions by semantic energy distribution modeling and gradient change analysis based on a semantic correlation domain comprises: combining the text node and the structure node to obtain a semantic node; extracting vector response intensity of the semantic nodes and taking the vector response intensity as a semantic energy value; Based on the semantic association domain, calculating energy change between adjacent semantic nodes to obtain a gradient sequence, and comparing each feature in the gradient sequence with a preset gradient threshold value to obtain a broken node and generate an energy anomaly domain; And verifying the continuity of the energy anomaly domain through the joint position coordinates to generate a fracture candidate region set.
- 4. The contrast learning-based multimodal teletext semantic conflict detection and correction method according to claim 3, wherein reconstructing the semantic fidelity breaking propagation chain by modal suppression coupling identification comprises: based on semantic fidelity breaking propagation chains, constructing a chain-level semantic contribution distribution matrix through intra-chain semantic energy statistics and distribution analysis; Based on a chain-level semantic contribution distribution matrix, a modal suppression index is obtained through calculation of a contribution deviation degree; Based on the suppression propagation sub-chain and the modal suppression index, the reconstructed semantic fidelity breaking propagation chain is obtained through nonlinear semantic stretching and path structure rearrangement.
- 5. The contrast learning-based multi-mode image-text semantic conflict detection and correction method according to claim 4, wherein obtaining the offset propagation subchain based on pseudo-semantic consistent segmentation discrimination and chain reinforcement of the reconstructed semantic fidelity broken propagation chain comprises: Semantic difference analysis of adjacent semantic nodes in the reconstructed semantic fidelity breaking propagation chain is carried out to obtain a difference sequence among the nodes, and joint position coordinates are used as a continuity constraint condition to obtain a difference segmentation set; Based on the difference segmentation set, obtaining a pseudo-semantic uniform segmentation set comprising a plurality of groups of pseudo-semantic uniform segments through comparison analysis of local consistency and global consistency; extracting a node sub-chain corresponding to each pseudo-semantic consistent segment in the pseudo-semantic consistent segment set, and taking the node sub-chain as an initial propagation path; Performing neighborhood search on an initial propagation path by taking energy continuity and path accessibility as search conditions, and obtaining a propagation sub-chain after path expansion; and identifying the relative position relation of each semantic node in the corresponding pseudo-semantic consistent segment according to the propagation sub-chain to obtain the semantic node weight and the offset propagation sub-chain.
- 6. The method for detecting and correcting multi-modal image-text semantic conflict based on contrast learning according to claim 5, wherein the steps of extracting features of the offset propagation sub-chain to obtain a time offset chain segment, and performing event consistency discrimination and chain structure update on semantic nodes to obtain a mapping classification result include: Extracting time semantic units of semantic nodes in the offset propagation sub-chain, and obtaining a time semantic sequence after node mapping; extracting phase descriptors, and carrying out phase segmentation and node labeling on a time semantic sequence according to the phase descriptors to construct a phase matrix; based on the phase matrix, performing time difference calculation and path mapping analysis on each semantic node to obtain a time offset chain segment; Based on the time offset chain segment, event consistency discrimination and chain structure updating are executed on the semantic nodes, and a mapping classification result is obtained.
- 7. The contrast learning-based multimodal teletext semantic conflict detection and correction method according to claim 6, wherein performing event consistency discrimination and chain structure update on semantic nodes based on time-offset segments to obtain a mapping classification result comprises: setting a classification constraint condition, and comprehensively judging a time offset chain segment according to the classification constraint condition; the classification constraint conditions comprise path continuity judgment and stage sequence consistency judgment; If the semantic node is continuous on the reconstructed semantic fidelity broken propagation link path structure and the phase sequence accords with the single event propelling logic, marking the semantic node as an asynchronous chain segment and carrying out correction processing; and combining the asynchronous chain segments and the real conflict chain segments to obtain a classification result, and mapping the classification result back into the reconstructed semantic fidelity breaking propagation chain to obtain a mapping classification result.
- 8. The method for detecting and correcting multi-modal image-text semantic conflict based on contrast learning according to claim 7, wherein if the multi-modal image-text semantic conflict is an asynchronous chain segment, extracting and correcting the information quantity of semantic nodes, and identifying an editable field after differential processing, comprising: receiving a fidelity constraint instruction, extracting node probability of each semantic node in a real conflict chain segment, taking the product of a modal suppression index and semantic node weight as a correction term, correcting the node probability, and calculating the information quantity of each semantic node by combining with an information theory; And differentiating the information quantity of adjacent semantic nodes to obtain a differential sequence, wherein the differential sequence comprises a plurality of groups of information density variation, comparing the information density variation with a preset variation threshold, and if the information density variation exceeds the preset variation threshold, marking the information density variation as an editable domain.
- 9. The contrast learning-based multimodal teletext semantic conflict detection and correction method according to claim 8, wherein performing a semantic write-back operation comprises: Obtaining a node position index and coordinate mapping relation based on the editable domain and the joint position coordinate; Positioning front and rear boundary nodes of the editable domain based on the node position index, respectively extracting semantic vectors and phases of the boundary nodes as front and rear anchor points to generate new semantic vectors; Based on the new semantic vector, correcting the stage of the semantic node in the editable domain to obtain a new stage, so that the new stage is continuous and smooth in time sequence, and repairing time offset; Combining the new semantic vector with the new stage to form a reconstruction result, and writing the reconstruction result back to the original semantic fidelity broken propagation chain structure node by node according to the node position index to complete the semantic write-back operation.
Description
Multi-mode image-text semantic conflict detection and correction method based on contrast learning Technical Field The invention relates to the technical field of data processing, in particular to a multi-mode image-text semantic conflict detection and correction method based on contrast learning. Background Along with the wide application of multi-modal data in intelligent searching, content generation and man-machine interaction systems, the multi-modal data gradually becomes an important form of information expression, in practical application, the multi-modal data usually exists in a structured or semi-structured form, and the multi-modal data internally contains multi-level semantic relations and complex context dependencies, so that semantic modeling and structural analysis means are needed to convert image-text information into computable semantic representations, and correlation analysis and consistency judgment are carried out in a unified semantic space, so that identification and positioning of semantic conflicts are realized. In the prior art, multi-mode semantics are generally judged based on similarity calculation, although semantic recognition can be rapidly carried out, the description of propagation relations inside a semantic structure is lacking, particularly, in the multi-mode fusion process, the strong mode information can have a suppression effect on the weak mode information, and along with continuous optimization of contrast learning, semantic representation continuously converges towards dominant distribution, so that fuzzy information is gradually compressed and tends to be indistinguishable in the path transmission process; in the consistency judging stage, as the judging basis focuses on the overall similarity, the compressed information is difficult to form effective judging features, so that the compressed information is implicitly classified into a consistent area to form an internal offset but externally stable semantic state, the state is not always explicitly corrected in the subsequent correction process, the semantic bearing capacity of a weak mode is continuously reduced by simplifying expression or reducing the complexity of the information to be further solidified, the influence of the dominant mode is continuously strengthened along with the iterative advancement of the process, the expression of the weak mode is gradually degraded, and finally a hidden and self-enhanced evolution path is formed, so that the semantic offset is continuously accumulated on the structural level and is difficult to be perceived or corrected in time. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a multi-mode image-text semantic conflict detection and correction method based on contrast learning, which solves the problems in the background art. In order to achieve the above purpose, the invention is realized by the following technical scheme: The multi-mode image-text semantic conflict detection and correction method based on contrast learning comprises the following steps: acquiring multi-mode image-text data, and executing data analysis based on a natural language processing technology driven by contrast learning to obtain a constructed semantic fidelity breaking propagation chain; Performing reconstruction on the semantic fidelity broken propagation chain through modal suppression coupling identification; based on the pseudo-semantic consistent segmentation discrimination and chain reinforcement of the reconstructed semantic fidelity broken propagation chain, obtaining an offset propagation sub-chain; Extracting features of the offset propagation sub-chain to obtain a time offset chain segment, and executing event consistency discrimination and chain structure update on semantic nodes to obtain a mapping classification result, wherein the mapping classification result comprises an asynchronous chain segment and a real conflict chain segment; If the semantic node is an asynchronous chain segment, extracting and correcting the information quantity of the semantic node, and identifying an editable domain after differential processing; based on the editable field, a semantic write-back operation is performed. The scheme of the invention at least comprises the following beneficial effects: According to the scheme, a modal suppression coupling recognition mechanism is introduced into a semantic fidelity fracture propagation chain, so that semantic contribution unbalance states originally hidden in a multi-modal comparison fusion process are explicitly modeled, semantic analysis is not only dependent on expansion and comparison of structural levels, but also can characterize dominant degrees and attenuation trends of semantic nodes of different sources in the fusion process from the angles of semantic energy distribution and path conduction relation, on the basis, conflict recognition is expanded into signal dominant degree discrimination from structural