Search

CN-121979925-A - Event tracing data grid risk analysis system

CN121979925ACN 121979925 ACN121979925 ACN 121979925ACN-121979925-A

Abstract

The application discloses a data grid risk analysis system for event tracing. The system comprises a Web content acquisition module, an analysis module, a graph construction module and an infection risk analysis module, wherein the Web content acquisition module is used for acquiring Web content from the world wide Web, the analysis module is used for converting the Web content into a structured information unit containing source content identification and calculating an inherent information risk value of the structured information unit, the graph construction module is used for dynamically constructing an information tracing graph according to the structured information unit, and the infection risk analysis module is used for executing reverse tracing of an information propagation path on the information tracing graph. The application discloses a method for extracting source relation and quantifying risks from unstructured Web content, which solves the technical problems of data base deletion and logic chain breakage when tracing infection risk information in a Web environment, and realizes reliable and efficient risk judgment tracing.

Inventors

  • NING SHIJIE
  • PENG FANG
  • CHEN CHAORAN
  • LI ZHENHUA

Assignees

  • 湖南创赋医疗科技有限公司

Dates

Publication Date
20260505
Application Date
20251219

Claims (10)

  1. 1. A data grid risk analysis system for event tracing, comprising: A data storage module; a Web content acquisition module configured to acquire Web content from the world wide Web; The analysis module is configured to analyze the Web content into a structural information unit, wherein the structural information unit comprises a unique content identifier of a current information unit, a topic entity identifier associated with the current information unit and a source content identifier determined by analyzing hyperlinks in the Web content; The diagram construction module is in communication connection with the data storage module and is configured to establish an information node corresponding to the current information unit in the data storage module according to each structured information unit generated by the analysis module, and establish a directed edge representing a source or quotation relation between the information node and a source information node corresponding to the source content identifier according to the source content identifier in the structured information unit, so as to dynamically construct an information tracing diagram; And the infection risk analysis module is in communication connection with the data storage module and is configured to locate a corresponding target information node in the information traceability graph based on the content identification of the target information unit when an analysis request for the target information unit is received, and recursively and reversely traverse the target information node along all the directed edges pointing to the target information node so as to identify one or more information propagation paths leading to the generation of the target information unit.
  2. 2. The system of claim 1, wherein the graph construction module is further configured to update the mapping between the topic entity identity and the information node's reference in a topic entity information index.
  3. 3. The system of claim 1, wherein the infection risk analysis module, prior to performing the recursive reverse traversal, first locates the target information node directly in the information provenance graph as a starting point for the reverse traversal by content identification of the target information element.
  4. 4. A system according to claim 1 or 3, wherein the recursive reverse traversal employs a depth-first traversal algorithm.
  5. 5. The system of claim 1, wherein the information nodes comprise an inherent information risk value, and wherein the infection risk analysis module is further configured to calculate a composite risk assessment score for the information propagation path based on the inherent information risk values of all information nodes on the information propagation path.
  6. 6. The system of claim 5, wherein the parsing module is further configured to calculate the intrinsic information risk value by a pre-set multi-factor weighting model, wherein the multi-factor weighting model includes at least a source risk factor based on a distribution source authority of the Web content and a type risk factor based on a content type of the Web content.
  7. 7. The system of claim 5, wherein said infection risk analysis module, when calculating said integrated risk score, employs a predetermined risk contribution attenuation factor for weight attenuation of said inherent information risk value for an information node based on a source distance of said information node from said target information node on said information propagation path.
  8. 8. The system of claim 5, wherein the infection risk analysis module is further configured to: For each information node created in the information traceability graph, calculating an accumulated risk potential value based on the inherent information risk value of the information node and the risk potential values of one or more source information nodes of the information node; periodically or when a preset condition is triggered, extracting entity subgraphs formed by all information nodes associated with a certain topic entity identifier and sources or introduction relation directed edges thereof, and calculating a topological vulnerability score for quantifying the ecological structural risk of the topic entity related information based on the preset topological index of the entity subgraphs; based on the maximum accumulated risk potential value in the information nodes associated with the subject entity and the topological vulnerability score of the subject entity, calculating a predictive risk index for predicting the misjudgment of the future infection risk information, and generating a risk early warning signal when the index exceeds a preset risk threshold.
  9. 9. The system of claim 1, wherein the infection risk analysis module is further configured to perform a blocking deduction: Mapping the information traceability graph into a risk flow network, wherein a starting node or a preset high-risk node of the information propagation path is taken as a source point, one or more preset key protection nodes are taken as sink points, and the capacity of the edge is determined based on the propagation relationship among the information nodes; Running a maximum flow minimum cut algorithm to calculate a minimum cut set for dividing the source point and the sink point; And identifying the directed edges in the minimal cut set or the information nodes pointed to by the directed edges as key bridging nodes for blocking risk propagation.
  10. 10. The system of claim 9, wherein the infection risk analysis module is further configured to perform a counterfactual simulation: and virtually removing the key bridging nodes in the information traceability graph, recalculating the accumulated risk potential values of the rest nodes in the network, and calculating the total risk difference value of the system before and after virtual removal so as to screen out the node with the highest expected risk reduction rate under unit intervention cost, thereby generating an intervention suggestion.

Description

Event tracing data grid risk analysis system Technical Field The invention relates to the technical field of data processing, in particular to a data grid risk analysis system for event tracing. Background In today's World Wide Web environment, information, particularly infection risk information about public health, has been created and disseminated by various levels of medical institutions, research centers, and news media at unprecedented speeds. However, this rapid propagation also provides a hotbed for outdated information, misinterpreted data, or flooding of conclusions based on preliminary studies. After an inaccurate infection risk determination is widely spread among the public, it is often limited in effect to correct it only afterwards. A core technical challenge is how to quickly and accurately trace back the original data source of this risk decision and its complete propagation path in the network information ecology. The traditional method based on keyword search or source authority evaluation is difficult to deal with the quote, transfer and cross-platform evolution of information in the propagation process, so that the tracing process is time-consuming and labor-consuming, and a deterministic propagation chain cannot be provided, so that the reliability of a specific infection risk judgment is difficult to carry out basic and systematic evaluation. Disclosure of Invention The application aims to provide a data grid risk analysis system for event tracing, which aims to solve the technical problems of low efficiency and insufficient accuracy in the prior art when tracing and analyzing the information of the risk of the infection under the Web environment. The application provides a data grid risk analysis system for tracing an event, which comprises a data storage module, a Web content acquisition module, an analysis module, a graph construction module and a transmission analysis module, wherein the Web content acquisition module is configured to acquire Web content from the world wide Web, the analysis module is configured to analyze the Web content into a structured information unit, the structured information unit comprises a unique content identifier of a current information unit, a topic entity identifier associated with the current information unit and a source content identifier determined by analyzing hyperlinks in the Web content, the graph construction module is in communication connection with the data storage module and is configured to establish an information node corresponding to the current information unit in the data storage module according to each structured information unit generated by the analysis module, a directed edge representing a source or introduction relation is established between the information node and the source information node corresponding to the source content identifier according to the source content identifier in the structured information unit, the transmission analysis module is in communication connection with the data storage module and is configured to enable the information node to be reversely positioned from the target node to the target node according to the source content identifier when a target information unit is subjected to a request, and the target information is reversely positioned to the target node. Optionally, the graph construction module is further configured to update, in a topic entity information index, a mapping relationship between the topic entity identity and the reference of the information node. Optionally, before performing the recursive reverse traversal, the infection risk analysis module first directly locates the target information node in the information traceability graph through the content identifier of the target information unit, and uses the target information node as a starting point of the reverse traversal. Optionally, the recursive reverse traversal employs a depth-first traversal algorithm. Optionally, the information nodes comprise an inherent information risk value, and the infection risk analysis module is further configured to calculate an integrated risk assessment score for the information propagation path based on the inherent information risk values of all information nodes on the information propagation path. Optionally, the parsing module is further configured to calculate the inherent information risk value through a preset multi-factor weighting model, where the multi-factor weighting model at least includes a source risk factor based on the authority of the publishing source of the Web content and a type risk factor based on the content type of the Web content. Optionally, when calculating the comprehensive risk determination score, the infection risk analysis module adopts a preset risk contribution attenuation factor to carry out weight attenuation on the inherent information risk value of a certain information node according to the source distance between the information node and the ta