CN-122020526-A - Method and system for breaking information island by data integration
Abstract
The invention discloses a method and a system for breaking information islands by data integration, wherein the method comprises the following steps of S1, obtaining multi-mode original data in a plurality of information islands, wherein the multi-mode original data comprises at least two of text data, sensing data and image data, S2, preprocessing the multi-mode original data to obtain standardized multi-mode data, S3, constructing a multi-mode collaborative learning engine based on a causal map, and integrating a multi-mode feature fusion unit, a causal inference unit, a weight self-adaptation unit and a threshold dynamic calibration unit through a modularized framework. According to the method and the system for breaking the information island through data integration, through a dual-stage feature fusion strategy, intra-mode attention and cross-mode cross attention are combined with an entropy weight method to determine self-adaptive weights, feature complementarity of all mode data can be fully utilized, and overall feature expression capacity and robustness are improved.
Inventors
- MA ZONGHUA
- SHU ZHIHUA
- WEI ZHENG
- WANG FANG
- WU YONGJUN
- BAO JUNCHENG
- ZHANG DEJUN
Assignees
- 重庆市工程管理有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260121
Claims (10)
- 1. The method for breaking the information island by data integration is characterized by comprising the following steps of: S1, acquiring multi-mode original data in a plurality of information islands, wherein the multi-mode original data comprises at least two of text data, sensing data and image data; s2, preprocessing the multi-mode original data to obtain standardized multi-mode data; S3, constructing a multi-modal collaborative learning engine based on a causal map, wherein the collaborative learning engine integrates a multi-modal feature fusion unit, a causal inference unit, a weight self-adaptation unit and a threshold dynamic calibration unit through a modularized framework, distributes the feature fusion weight of each modality by combining a preset weight determination rule through the feature fusion unit, and determines the interaction paths among different modality data by combining the causal inference unit with a threshold determination rule; S4, generating a structured knowledge graph based on the interaction path and the feature layer fusion result to form a closed loop of data integration, knowledge generation and decision optimization; s5, outputting a data integration result by using the structured knowledge graph, and providing support for decision making.
- 2. The method for breaking information islands by data integration according to claim 1, wherein the step of obtaining multi-modal raw data in a plurality of information islands in S1 includes: Accessing data sources of each information island through a distributed data acquisition protocol, and classifying acquired original data according to a preset data type classification rule to obtain a data set of a corresponding mode; carrying out integrity check on the data of each mode, setting a data integrity threshold value, and removing samples with the occupation ratio of the missing key fields or the occupation ratio of the invalid data exceeding a preset proportion; The invalid data includes at least one of an abnormal jump value in the sensing data, a nonsensical character sequence in the text data, and a full black/full white image in the image data.
- 3. The method for breaking information islands by data integration according to claim 1, wherein preprocessing the multi-modal raw data in S2 includes: Sequentially executing word segmentation, stop word removal and word vector mapping operation on the text data, reserving words meeting the preset word frequency condition to participate in word vector construction, and converting the words into text feature vectors with fixed dimensionality; Removing abnormal data from the sensing data by adopting an abnormal value removing strategy, mapping the abnormal data to a preset numerical value interval through normalization processing, and obtaining a standardized sensing characteristic vector; performing size normalization and pixel value normalization on the image data, removing images with definition lower than a preset threshold value, and extracting deep features of the images to obtain image feature vectors; and aligning the modal feature vectors according to the sample association relation, and reserving sample pairs with association degrees meeting preset conditions to form a standardized multi-modal data set.
- 4. The method for breaking information islands by data integration according to claim 1, wherein the step of constructing a causal graph-based multi-modal collaborative learning engine in S3 comprises the steps of constructing an engine architecture: a layered modular architecture is adopted, and the architecture is divided into an input layer, a core processing layer and an output layer from top to bottom; the input layer is provided with a multi-mode data adapting interface, supports batch access and format conversion of multi-type data, and realizes stable receiving of high-concurrency data through a data buffer pool; The core processing layer integrates a multi-mode feature fusion unit, a causal inference unit, a weight self-adaptive unit and a threshold dynamic calibration unit, and the units realize data interaction and collaborative work through a message queue; the output layer provides three output interfaces of cross-modal fusion characteristics, causal patterns and weight configuration parameters, and supports seamless joint with the knowledge graph construction module.
- 5. The method for breaking information islands by data integration according to claim 4, wherein the multi-modal feature fusion unit construction comprises: adopting a dual-stage fusion strategy, wherein the first stage strengthens single-mode feature expression through an intra-mode attention mechanism, and the second stage realizes different-mode feature depth interaction through a cross-mode cross-attention mechanism; an adaptive weight determining rule is designed based on an entropy weight method, information entropy and difference coefficients of each mode characteristic are calculated, the fusion weight of each mode is obtained through normalization, and dynamic update is carried out in real time according to data distribution change; And combining the fusion weight and the attention mechanism to generate a cross-modal fusion characteristic.
- 6. The method of data consolidation breaking information islands of claim 4 wherein the causal inference unit construction comprises: Constructing an initial causal framework of the multi-modal characteristics based on a Bayesian network, and primarily screening potential causal association pairs through mutual information inspection; Adopting a causal discovery algorithm based on scores, and iteratively optimizing a causal structure by taking a preset information criterion as an objective function; Setting a causality significance threshold, reserving causality meeting a threshold condition, and generating an initial causality graph; and calculating the average processing effect of each causal path based on a trend score matching method, normalizing to obtain causal path influence weights, and reserving paths with the influence weights meeting preset conditions as core interaction paths to generate a final causal map.
- 7. The method for breaking information islands by data integration according to claim 4 wherein the weight adaptation unit construction comprises: Adjusting the fusion weight of each mode based on model prediction error feedback, setting an error threshold, and reducing the weight of the model prediction error when the prediction error corresponding to a certain mode characteristic is higher than the error threshold; when the error is lower than the preset proportion of the error threshold value, the weight of the error is proportionally increased, and the weight adjustment amplitude does not exceed the preset range of the initial weight; And setting the lower limit of the weight of each mode, so as to avoid the excessive suppression of a single mode.
- 8. The method of data integration and information island breaking according to claim 4, wherein the threshold dynamic calibration unit construction comprises: initializing reference values of various thresholds based on domain knowledge and historical data statistics; adopting an online learning algorithm, taking a model fusion effect as an optimization target, and updating a threshold value after each batch of data processing; setting the value range of various thresholds, and ensuring that the thresholds are adjusted within a reasonable interval.
- 9. The method for breaking information islands by data integration according to claim 1, wherein generating a structured knowledge-graph and forming a closed loop in S4 comprises: Associating the core interaction paths with cross-modal fusion features, defining entity nodes and relation edges of the knowledge graph, setting a relation confidence threshold, and only reserving the relation edges with confidence satisfying the conditions; The method comprises the steps of storing a structured knowledge graph by adopting a graph database, establishing a mapping index of entity nodes and feature data, setting a knowledge updating weight threshold, and triggering the dynamic updating of the knowledge graph when the influence weight change of newly added data on the existing causal relationship exceeds the threshold; Constructing a decision reasoning rule base based on the structured knowledge graph, setting rule triggering weights and triggering thresholds, and generating decision suggestions when the sum of the cumulative weights of the rule meeting conditions exceeds the thresholds; And feeding back the decision advice to the data acquisition link, and adjusting the data acquisition range and the priority based on the decision contribution degree weight to realize closed loop iteration.
- 10. A system for data integration breaking information islands, applied to perform the method for data integration breaking information islands according to any of the claims 1 to 9, comprising: The system comprises a data acquisition module, a data preprocessing module, a collaborative learning engine module, a knowledge graph construction module and a decision support module; the data acquisition module is used for acquiring multi-mode original data in a plurality of information islands; The data preprocessing module is used for preprocessing the multi-mode original data to obtain standardized multi-mode data; The collaborative learning engine module is used for constructing a multi-mode collaborative learning engine based on a causal map, the collaborative learning engine integrates a multi-mode feature fusion unit, a causal inference unit, a weight self-adaptation unit and a threshold dynamic calibration unit through a modularized framework, distributes the feature fusion weight of each mode by combining a preset weight determination rule through the feature fusion unit, and determines the interaction paths among different mode data by combining the causal inference unit with a threshold determination rule; The knowledge graph construction module is used for generating a structured knowledge graph based on the interaction path and the feature layer fusion result and realizing dynamic update; And the decision support module is used for outputting a data integration result by utilizing the structured knowledge graph and providing support for decision.
Description
Method and system for breaking information island by data integration Technical Field The invention relates to the field of data processing, in particular to a method and a system for breaking information islands by data integration. Background At the present time of digitizing the wave mats, the data volume of various industries and fields is increasing explosively. The data are widely distributed in different systems, platforms and organizations, forming a plurality of independent information islands. Each information island contains rich multi-mode data, and various types of text data, sensing data, image data and the like are covered. However, due to the lack of efficient data interaction and integration mechanisms, these data are limited to respective small areas, and cross-system, cross-domain sharing and fusion cannot be achieved. The method not only causes the idle waste of a large amount of valuable data resources, but also causes the overall utilization efficiency of the data to be extremely low, and the due value of the data is difficult to be fully exerted. Meanwhile, in the decision making process, comprehensive and accurate information is a key for making scientific and reasonable decisions. However, due to the existence of the information island, a decision maker can only acquire local and unilateral data, and cannot comprehensively consider various information. In addition, the data of different modes has unique characteristics and expression forms, and the traditional data processing method has a plurality of difficulties in processing the multi-mode data, so that effective fusion and collaborative analysis of the data of different modes are difficult to realize. In addition, the existing method has obvious defects in terms of mining causal relations among data, can not accurately identify causal influence paths among different mode data, can not provide depth and instructive information for decision making, and seriously affects quality and effect of decision making. Therefore, it is urgent to develop a method capable of effectively breaking information islands and realizing multi-mode data integration and collaborative utilization. Disclosure of Invention The invention aims to provide a method and a system for breaking information islands by data integration, which solve the problems that a decision maker can only acquire local and unilateral data and cannot comprehensively consider information in multiple aspects, the data in different modes has unique characteristics and expression forms, and the traditional data processing method has a plurality of difficulties in processing multi-mode data and is difficult to realize effective fusion and collaborative analysis of the data in different modes. The invention realizes the aim through the following technical scheme that the method for breaking the information island by data integration comprises the following steps: S1, acquiring multi-mode original data in a plurality of information islands, wherein the multi-mode original data comprises at least two of text data, sensing data and image data; s2, preprocessing the multi-mode original data to obtain standardized multi-mode data; S3, constructing a multi-modal collaborative learning engine based on a causal map, wherein the collaborative learning engine integrates a multi-modal feature fusion unit, a causal inference unit, a weight self-adaptation unit and a threshold dynamic calibration unit through a modularized framework, distributes the feature fusion weight of each modality by combining a preset weight determination rule through the feature fusion unit, and determines the interaction paths among different modality data by combining the causal inference unit with a threshold determination rule; S4, generating a structured knowledge graph based on the interaction path and the feature layer fusion result to form a closed loop of data integration, knowledge generation and decision optimization; s5, outputting a data integration result by using the structured knowledge graph, and providing support for decision making. Further, the step of obtaining the multi-mode raw data in the plurality of information islands in S1 includes: Accessing data sources of each information island through a distributed data acquisition protocol, and classifying acquired original data according to a preset data type classification rule to obtain a data set of a corresponding mode; carrying out integrity check on the data of each mode, setting a data integrity threshold value, and removing samples with the occupation ratio of the missing key fields or the occupation ratio of the invalid data exceeding a preset proportion; The invalid data includes at least one of an abnormal jump value in the sensing data, a nonsensical character sequence in the text data, and a full black/full white image in the image data. Further, the preprocessing the multi-mode raw data in S2 includes: Sequentially executing word segmentation