Search

CN-122019165-A - Heterogeneous data stream oriented financial event processing and resource scheduling system and method

CN122019165ACN 122019165 ACN122019165 ACN 122019165ACN-122019165-A

Abstract

The invention relates to the technical field of data processing, and discloses a heterogeneous data stream-oriented financial event real-time processing and resource scheduling system and method. The basic principle of the scheme is that firstly, entities in multi-source heterogeneous financial data are mapped to a statistical manifold taking a Fisher information matrix as a measure to more essentially represent the statistical characteristics of the multi-source heterogeneous financial data, secondly, a geometric-aware dynamic graph neural network is constructed on the statistical manifold, information aggregation is carried out in manifold cut space through logarithmic mapping and exponential mapping, so that the relation learning of a geometric structure is kept, meanwhile, a hybrid intelligent scheduler is integrated, and computing resources are dynamically distributed by monitoring data value and system load and combining rules and an online learning model. The scheme has the most core technical effect that the processing efficiency and the operation stability of the whole system under the high throughput and low delay scenes can be ensured while the depth and the precision analysis of complex financial events are carried out.

Inventors

  • HONG FEI
  • ZHU XIAOKANG
  • TAO JIN

Assignees

  • 杭州龙旗科技有限公司

Dates

Publication Date
20260512
Application Date
20260129

Claims (10)

  1. 1. A financial event real-time processing and resource scheduling system oriented to heterogeneous data streams is characterized by comprising: The data access module is used for receiving and analyzing the multi-source heterogeneous real-time financial data stream and extracting entity data in the real-time financial data stream; the manifold mapping module is connected to the data access module and is used for mapping the entity data into coordinate points on a statistical manifold, wherein the statistical manifold is a parameter space taking a Fisher information matrix as a Riemann metric; The geometric figure network module is connected to the manifold mapping module and is used for carrying out geometrical perception on the coordinate points on the statistical manifold based on a financial association map so as to carry out embedded learning on the dynamic nodes; the hybrid intelligent scheduler is respectively connected to the data access module, the manifold mapping module and the geometric network module, and is used for monitoring the data flow value and the system calculation load processed by the data access module, and dynamically distributing calculation resources for the manifold mapping module and the geometric network module based on a preset rule and an online learning model; and the event judging module is connected to the geometric figure network module and used for identifying and outputting financial event signals according to the output dynamic node embedding.
  2. 2. The system of claim 1, wherein the manifold mapping module comprises: the probability distribution estimation unit is used for estimating the corresponding probability distribution type and parameters according to the characteristics of the entity data; And the manifold coordinate mapping unit is used for mapping the entity to the corresponding point which is characterized by the probability distribution parameter on the statistical manifold according to the probability distribution type and the parameter.
  3. 3. The system according to claim 2, wherein the probability distribution estimation unit is specifically configured to: Adaptively selecting one of a Gaussian distribution family, a Dirichlet distribution family and a student t distribution family for modeling according to the entity data type; based on the selected profile family, a parameter vector that generates a profile is estimated from the entity data.
  4. 4. The system of claim 1, wherein the geometry network module comprises: The geometric message transfer unit is used for executing graph convolution operation based on Riemann geometry on the statistical manifold, and specifically comprises the steps of converting manifold coordinate points of adjacent nodes into tangent vectors of a target node tangent space through logarithmic mapping, aggregating and transforming the tangent vectors in the tangent space, and mapping a result back to the statistical manifold through exponential mapping to obtain updated node coordinate points; and the manifold parameter updating unit is used for updating the trainable parameters of the geometric network module by adopting a Riemann optimization algorithm.
  5. 5. The system of claim 4, wherein the geometric messaging unit is configured to, for a target node i, at a manifold coordinate point at layer l+1 Is defined by the following function: ; Wherein, the And Respectively representing coordinate points of a target node i and a neighbor node j thereof on the first layer on the statistical manifold; n (i) represents a neighbor case set of the node i; representing the distance from the base point To the point Is a logarithmic mapping of (a) and its result is at the base point Is cut into the space; the message contribution weight of the node j to the node i is shown as a learnable attention weight or a fixed normalization coefficient based on a graph structure; Is expressed at the base point And (3) index mapping, namely mapping vectors in a tangent space back to the statistical manifold.
  6. 6. The system of claim 1, wherein the hybrid intelligent scheduler comprises: The system state monitoring unit is used for collecting the system computing resource utilization rate, the task queue states of all modules and the value index estimated on the data flow processed by the data access module in real time; The resource scheduling strategy unit is connected with the system state monitoring unit, integrates a rule engine and a context multi-arm slot machine model and is used for generating a resource allocation decision based on the acquired information; And the resource dynamic execution unit is connected with the resource scheduling policy unit and is used for executing the resource allocation decision and dynamically adjusting the computing resources allocated to the manifold mapping module and the geometric figure network module.
  7. 7. The system of claim 6, wherein the contextual multi-arm slot machine model performs online learning with a system transient state and a feature vector of a task to be scheduled as contextual inputs, targeting maximizing a system long-term performance index, wherein the long-term performance index is a weighted function of a number of high-value events successfully processed per unit time and corresponding resource consumption.
  8. 8. The system according to claim 6 or 7, wherein the resource scheduling policy unit is configured to execute a scheduling policy comprising logic of: When high-value data flow is identified and the system load exceeds a set threshold, generating a decision for preferentially guaranteeing the coordinate mapping task of the manifold mapping module and the forward reasoning task of the geometric figure network module; when the system load is lower than a set threshold, a decision is generated that triggers an incremental learning task of the geometry network module.
  9. 9. A financial event real-time processing and resource scheduling method oriented to heterogeneous data flow is characterized by comprising the following steps: s1, receiving and analyzing a multi-source heterogeneous real-time financial data stream, and extracting entity data in the multi-source heterogeneous real-time financial data stream; S2, mapping the entity data into coordinate points on a statistical manifold taking a Fisher information matrix as a Riemann metric; S3, monitoring the data stream value and the system calculation load processed in the step S1, and dynamically distributing calculation resources based on a preset rule and an online learning model; S4, utilizing the distributed resources, and based on a pre-constructed financial association graph, performing geometrical perception of dynamic node embedding learning on coordinate points on the statistical manifold; s5, identifying and outputting financial event signals according to the dynamic node embedding obtained by the learning in the step S4.
  10. 10. The method according to claim 9, wherein the node embedding learning of the coordinate points on the statistical manifold in step S4 includes performing geometric messaging on the statistical manifold: Mapping coordinate points of neighbor nodes of the target node on manifold to a tangent space at the target point through logarithmic mapping to obtain corresponding tangent vectors; performing aggregation and linear transformation on the tangent vectors in the tangent space; And mapping the transformed result back to the statistical manifold through index mapping, and taking the statistical manifold as a coordinate point updated by the target node.

Description

Heterogeneous data stream oriented financial event processing and resource scheduling system and method Technical Field The invention relates to the technical field of data processing, in particular to a heterogeneous data stream oriented financial event processing and resource scheduling system and method. Background With the increasing degree of digitization of the financial market, real-time event analysis and decision making by utilizing multi-source heterogeneous data such as quotations, news, bulletins, social media texts and the like have become the core for quantitative investment and risk management. Current technological developments are evolving from processing single, structured data to complex event-driven analysis that fuses multi-modal, unstructured data. Researchers have widely employed natural language processing, time series analysis, and graphic neural networks, among other techniques, in an attempt to automatically identify events from a mass data stream and mine their associated effects to capture Alpha signals in the marketplace. However, the prior art still faces significant bottlenecks in achieving the above objectives. Firstly, at the data representation level, the mainstream method models the entity in Euclidean space, ignores inherent uncertainty and complex statistical dependence of financial data, causes deviation to the measurement of event influence, and is difficult to accurately describe the nonlinear conduction rule. Secondly, at the graph learning level, the traditional graph neural network directly performs node characteristic aggregation in Euclidean space, and when the graph neural network is applied to data with a special statistical geometry, the inherent manifold constraint of the graph neural network is destroyed, so that information distortion is caused. Finally, in the aspect of system efficiency, the calculation cost of the complex model is huge, the existing resource scheduling strategies are mostly static or based on simple rules, and cannot be adaptively adjusted according to the real-time value density and the system state of the data flow, so that the processing delay is high when high-value events occur, the calculation force resources are idle when the system is idle, and the overall resource utilization efficiency and the system stability are insufficient. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a financial event real-time processing and resource scheduling scheme which can realize the deep analysis of financial events and ensure the processing efficiency and the running stability. According to a first aspect of the present invention, a heterogeneous data stream oriented financial event real-time processing and resource scheduling system is provided, including: The data access module is used for receiving and analyzing the multi-source heterogeneous real-time financial data stream and extracting entity data in the real-time financial data stream; The manifold mapping module is connected to the data access module and is used for mapping the entity data into coordinate points on a statistical manifold, wherein the statistical manifold is a parameter space taking a Fisher information matrix as a Riemann metric; the geometric figure network module is connected to the manifold mapping module and is used for carrying out geometrical perception on the coordinate points on the statistical manifold based on the financial correlation map so as to carry out dynamic node embedding learning; The hybrid intelligent scheduler is respectively connected to the data access module, the manifold mapping module and the geometric network module, and is used for monitoring the data flow value and the system calculation load processed by the data access module, and dynamically distributing calculation resources for the manifold mapping module and the geometric network module based on a preset rule and an online learning model; And the event judging module is connected to the geometric figure network module and used for identifying and outputting financial event signals according to the output dynamic node embedding. According to some embodiments, in the system of the first aspect of the present invention, the manifold mapping module includes: the probability distribution estimation unit is used for estimating the corresponding probability distribution type and parameters according to the characteristics of the entity data; and the manifold coordinate mapping unit is used for mapping the entity to the corresponding point on the statistical manifold, which is characterized by the probability distribution parameter, according to the probability distribution type and the parameter. According to some embodiments, in the system of the first aspect of the present invention, the probability distribution estimating unit is specifically configured to: Adaptively selecting one of a Gaussian distribution family, a Dirichlet distributio