CN-121998684-A - Intelligent decision data processing method and system

CN121998684ACN 121998684 ACN121998684 ACN 121998684ACN-121998684-A

Abstract

The invention relates to an intelligent decision data processing method and system in the technical field of computer information processing and artificial intelligence, wherein the method comprises the steps of constructing a causal graph, training a dual machine learning model, acquiring a candidate strategy set, executing inverse fact deduction by utilizing the dual machine learning model aiming at each candidate strategy in the candidate strategy set, constructing a comprehensive risk function and a multi-objective utility function, fusing the comprehensive risk function serving as a punishment item with the multi-objective utility function, constructing an objective function, solving a strategy for maximizing the objective function in the candidate strategy set, and determining the objective function as a globally optimal objective strategy. Compared with the prior art, the method provided by the invention can obviously improve the judging capability.

Inventors

DING ZIJIAN
DING JUNWEI
CHEN DEPIN

Assignees

钛动科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260207

Claims (10)

1. An intelligent decision data processing method, comprising: Constructing a causal graph comprising confounding nodes containing contextual characteristics, intervention nodes containing strategies and result nodes containing observation indexes, and training a dual machine learning model, wherein the dual machine learning model comprises a result predictor for fitting the mapping relation between the confounding nodes and the result nodes and an intervention predictor for fitting the tendencies relation between the confounding nodes and the intervention nodes; Acquiring a candidate strategy set, and aiming at each candidate strategy in the candidate strategy set, executing inverse fact deduction by using the double machine learning model to obtain a potential result expectation of the candidate strategy; Constructing a comprehensive risk function, wherein the comprehensive risk function at least comprises a profit fluctuation index, and the profit fluctuation index is calculated based on the difference distribution between the potential result expectation and the reference result output by the result predictor; constructing a multi-objective utility function, wherein the multi-objective utility function is a weighted combination of instant gain efficiency expected to be calculated based on the potential result and long-term potential value; And fusing the comprehensive risk function as a penalty term with the multi-objective utility function, constructing an objective function, solving a strategy for maximizing the objective function in the candidate strategy set, and determining the objective function as a globally optimal objective strategy.
2. The method of claim 1, wherein the deriving obtains a potential outcome expectation for the candidate policy, comprising: the result predictor outputs a baseline result based on the current context feature; The intervention predictor outputs a historical trend policy based on the current context characteristics; Calculating a result residual error between an actual observation result and a reference result output by the result predictor and an action residual error between an actual intervention action and a history tendency strategy output by the intervention predictor; And mapping the deviation of the current candidate strategy relative to the historical trend strategy into a result gain quantity by using the causal effect parameter, and superposing the result gain quantity on the reference result, thereby synthesizing the potential result expectation.
3. The method according to claim 1, wherein the profit volatility index is calculated by calculating a difference between the potential result expectation and the reference result to obtain a policy increment profit, constructing a probability density function of the policy increment profit, and solving an integral upper limit value for the probability density function so that a constant integral result of the probability density function over a range from minus infinity to the integral upper limit value is equal to a preset tail probability threshold value, and determining the integral upper limit value as the profit volatility index.
4. The method of claim 3, wherein the integrated risk function further includes a distribution stability indicator calculated by calculating a divergence between a predicted distribution of system states after the candidate strategy is executed, the predicted distribution of system states being a probability distribution of state indicators predicted by the system after the candidate strategy is executed, and a reference distribution of history being a probability distribution of state indicators in a steady state of history.
5. The method of claim 1, wherein constructing the multi-objective utility function comprises calculating a difference ratio between the potential result expectations and the execution costs of the candidate strategy as the instant gain efficiency, and cumulatively summing the predicted values of future time steps as the long-term potential value with the potential result expectations as the first term of a future time window value sequence in combination with a time discount factor.
6. The method of claim 1, wherein constructing the objective function comprises: The method comprises the steps of introducing a preset risk aversion coefficient for adjusting the sensitivity degree of a decision process to risks, weighting the comprehensive risk function by using the risk aversion coefficient to obtain a risk penalty term, and calculating the difference between the multi-objective utility function and the risk penalty term to construct the objective function.
7. The method of claim 1, wherein after determining a target policy that is globally optimal, the method further comprises determining whether the target policy triggers a prohibited condition in a preset security constraint rule base, and if the triggered and corresponding rule penalty weight exceeds a preset risk tolerance threshold, determining that the target policy is not compliant.
8. The method of claim 1, wherein after determining the target policy that is globally optimal, the method further comprises: Detecting whether the portrait tag of the current target object contains a vulnerable feature or not, and simultaneously detecting whether the target strategy contains a peeling attribute tag or not; Detecting whether the age attribute of the current target object belongs to the minor category or not, and simultaneously detecting whether the content classification of the target strategy relates to a restriction level attribute or not; and if the candidate strategy triggers any situation, judging that the ethical verification is not passed.
9. The method according to claim 7 or 8, wherein, And searching for a substitute strategy which has the minimum Euclidean distance with the target strategy and meets the compliance condition as the target strategy to be finally executed in response to the non-compliance of the target strategy or the non-passing of the ethical verification.
10. An intelligent decision data processing system comprising a memory for storing a computer program, a processor for executing the computer program to implement the method of any of claims 1 to 9.

Description

Intelligent decision data processing method and system Technical Field The application relates to the technical field of computer information processing and artificial intelligence, in particular to an intelligent decision data processing method and system. Background In the current big data processing and intelligent decision system, a computer system needs to continuously collect massive time sequence data and interaction records from distributed network nodes (such as mobile terminals, server logs and third party API interfaces), and predict and intervene future states of the system through an algorithm model. Existing data processing architectures typically rely on statistical dashboards or rule-based automated scripts to trigger corresponding control instructions (e.g., to trigger early warning when the monitored flow value is below a set value) by washing and aggregating historical data. In order to optimize decision parameters of the system, the prior art also often adopts an online comparison test (such as an a/B test), and the effect of different parameter configurations is verified through a small-range flow distribution. However, while existing data processing techniques are mature in structured data storage and offline analysis, significant technical bottlenecks remain in processing highly dynamic, non-stationary real-time data streams and building complex decision logic: First, in a real network environment, the collected timing signals (e.g., click streams, interaction frequencies, etc.) tend to be accompanied by a large amount of random noise and non-periodic fluctuations. The existing monitoring system mostly adopts a moving average or static statistical threshold value to judge the signal state. The method is difficult to effectively separate effective signal components from background noise in a time-frequency domain, so that the system cannot respond in millisecond or second level at the initial stage of abrupt change of data distribution, and a trend can be confirmed by waiting for data accumulation to a certain level, so that serious signal perception lag is caused. Second, due to the layering and isolation of the network architecture, the interaction data of users on different applications or terminals is typically stored in separate databases (i.e., data islands). Existing data correlation techniques rely primarily on matching of single identifiers. This results in the system being unable to concatenate the scattered, fragmented interactions into a complete sequential logic chain, and thus unable to support complex intent reasoning across domains. Third, the models currently in mainstream are mostly fitted based on statistical correlation of historical observations. However, such models are essentially incapable of handling causal intervention problems. When an intervention strategy which never occurs in a historical database (for example, executing a brand new parameter configuration or resource allocation scheme) needs to be evaluated, the potential influence of the intervention action on the system state cannot be simulated in the virtual environment, so that the system can only rely on trial and error when facing an unknown scene, and the prejudgement capability is lacking. Fourth, in existing automated generation or decision-making systems, verification for compliance, security, or specific rules typically employs a post-processing architecture, i.e., candidate results are generated from models and then screened through filters. This mechanism not only results in wasted computational resources in the generation of a large number of invalid results, but also, because the generation process itself lacks guidance of constraint terms, when the filter fails or responds to timeout, it is very easy to output instructions that do not conform to preset rules, resulting in a risk of system operation. Disclosure of Invention In order to at least solve the technical problem that the existing decision method lacks the prejudging capability, an intelligent decision data processing method and system are provided. According to a first aspect of the invention, an intelligent decision data processing method is provided, which comprises the steps of constructing a causal graph comprising confounding nodes containing contextual characteristics, intervention nodes containing strategies and result nodes containing observation indexes, training a dual machine learning model, wherein the dual machine learning model comprises a result predictor for fitting a mapping relation between the confounding nodes and the result nodes and an intervention predictor for fitting a tendency relation between the confounding nodes and the intervention nodes, acquiring a candidate strategy set, performing inverse fact deduction by using the dual machine learning model for each candidate strategy in the candidate strategy set, constructing a comprehensive risk function, wherein the comprehensive risk function comprises at