CN-122022790-A - Dynamic send to the compositor-way supporting method and system based on reinforcement learning and data mining technology

CN122022790ACN 122022790 ACN122022790 ACN 122022790ACN-122022790-A

Abstract

The invention discloses a dynamic branch send to the compositor process method and a system based on reinforcement learning and data mining technology, which comprise the following steps of firstly constructing a structured payment request set, secondly executing structural intervention operation to generate a synthetic payment request set, thirdly constructing a payment request evolution diagram, fourthly constructing an improved PSR model, introducing a topology compression mapping mechanism and constructing a state primitive set, fifthly constructing a state primitive combination to generate a prediction vector set, sixthly constructing a scheduling intention forest, seventhly executing strategy backtracking and path pruning operation in the scheduling intention forest, screening an optimal intention branch and determining a target scheduling action set, eighth executing corresponding payment operation according to the target scheduling action set, recording payment execution state and execution time and outputting a dynamic branch send to the compositor process result. The invention combines the improved PSR model with the scheduling intention forest to realize the dynamic branch send to the compositor.

Inventors

WANG YAN
Ji Xianglian
Feng Tianzuo
XUAN LINGMIN
MA ZHICHENG
PAN TONG
LV WENLING
FENG SHUANG

Assignees

水发集团有限公司

Dates

Publication Date: 20260512
Application Date: 20251226

Claims (10)

1. The dynamic branch send to the compositor path method based on reinforcement learning and data mining technology is characterized by comprising the following steps: Collecting original payment request data, and constructing a structured payment request set; step two, executing structural intervention operation on the structured payment request set, wherein the structural intervention operation comprises splitting, freezing, constructing an alternative path and aggregating the payment request to generate a synthetic payment request set; thirdly, taking the payment requests in the synthesized payment request set as nodes, and establishing directed edge connection to construct a payment request evolution diagram; Step four, an improved PSR model is built, the improved PSR model introduces a topology compression mapping mechanism, a WEISFEILER-Lehman graph isomorphic compression method is adopted to compress and map the payment request evolution graph into a group of structure mode IDs, and a state primitive set is built; constructing a state primitive combination based on the state primitive set, mapping the state primitive combination, and generating a prediction vector set; Step six, constructing a scheduling intention forest based on the prediction vector set; executing strategy backtracking and path pruning operations in the scheduling intention forest, screening an optimal intention branch, and determining a target scheduling action set; And step eight, executing corresponding payment operation according to the target scheduling action set, recording the payment execution state and execution time, and outputting a dynamic send to the compositor-way result.
2. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the first step is specifically: Collecting original payment request data from a financial management system, a contract payment system and a third party payment interface, wherein the original payment request data comprises a payment request number, a payment amount, a payment account, a collection account, a request submitting time, a cut-off payment time and a request service type field; and executing field alignment processing on the original payment request data, unifying field naming, format types and time representation modes of the source data, and constructing a structured payment request set with consistent fields.
3. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the second step is specifically: For each payment request in the structured collection of payment requests, performing a structured intervention operation based on the payment amount, the payment account, the collection account, the request submission time, the expiration payment time, and the request traffic type field: When the payment amount of a single payment request exceeds the payment amount which can be executed once in the current dispatching cycle of the same payment account, splitting the payment request into a plurality of sub-payment requests according to time sequence, wherein each sub-payment request keeps the original payment request number and is added with a sub-serial number identifier; when the cut-off payment time of the payment request is later than the ending time of the current dispatching cycle, executing freezing processing on the payment request, marking the payment request as a frozen state, and temporarily not participating in the payment execution of the current dispatching cycle; when multiple payment channels exist between a payment account and a collection account of a payment request, executing alternative path construction processing on the payment request, generating at least one alternative payment request with different payment accounts or different payment channels for the same payment request, and recording alternative relation identifiers; When a plurality of payment requests have the same collection account and the cut-off payment time is in the same scheduling period, executing aggregation processing, combining the plurality of payment requests into a combined aggregate payment request, and recording a corresponding original payment request number set in the aggregate payment request; And uniformly gathering the payment requests obtained through splitting treatment, freezing treatment, alternative path construction treatment and aggregation treatment to form a composite payment request set.
4. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the third step is specifically: Taking each payment request in the synthesized payment request set as a graph node, and constructing a node set of a payment request evolution graph; establishing directed edge connections between the node sets based on structural intervention relationships between payment requests in the composite payment request set, the directed edge connections including split evolution edges, substitution evolution edges and aggregation evolution edges; When a payment request is generated by splitting another payment request, a splitting evolution edge which is pointed to a sub-payment request node by an original payment request node is established between corresponding nodes, and a sub-sequence number identifier is added on the splitting evolution edge; When a payment request is generated by another payment request through alternative path construction processing, establishing an alternative evolution edge which is pointed to an alternative payment request node by an original payment request node between corresponding nodes, and recording an alternative relation identifier on the alternative evolution edge; when the payment request is an aggregate payment request generated by aggregation processing of a plurality of payment requests, respectively establishing aggregation evolution edges pointing to the aggregate payment request nodes between a plurality of corresponding original payment request nodes and the aggregate payment request nodes; When a payment request is marked as a frozen state through freezing treatment, only corresponding frozen payment request nodes are reserved in the payment request evolution diagram, freezing state identification is recorded on the frozen payment request nodes, and no directed edge connection is established between the frozen payment request nodes and other nodes; For the payment requests which do not undergo splitting treatment, freezing treatment, alternative path construction treatment and aggregation treatment, only corresponding isolated nodes are reserved, and no directed edge connection is established; And forming a payment request evolution graph based on the node set and the corresponding directed edge connection.
5. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the step four is specifically: An improved PSR model is built, the improved PSR model introduces a topology compression mapping mechanism, and the state construction is carried out on a payment request evolution diagram, wherein the topology compression mapping mechanism specifically comprises: Distributing node initial structure identifiers to each node in the payment request evolution graph, wherein the node initial structure identifiers consist of generation mode identifiers and connection type identifiers, the generation mode identifiers are used for identifying that the payment request corresponding to the node is generated by splitting processing, alternative path construction processing, aggregation processing or freezing processing, and the connection type identifiers are used for identifying the directed edge connection types corresponding to the node in the payment request evolution graph; Based on the node initial structure identification, executing multiple rounds of topology identification updating operation on the payment request evolution graph according to node identification iteration rules of WEISFEILER-Lehman graph isomorphic compression method, wherein each round of topology identification updating operation comprises: Aiming at each node, respectively acquiring connection type identifiers of all incoming edges pointing to the node and connection type identifiers of all outgoing edges pointing to other nodes by the node, and respectively forming an incoming edge type sequence and an outgoing edge type sequence; Collecting current wheel node structure identifiers corresponding to the incoming edge and the outgoing edge respectively, and sequentially combining an incoming edge type sequence, an incoming edge current wheel node structure identifier sequence, an outgoing edge type sequence and an outgoing edge current wheel node structure identifier sequence to form a neighborhood structure sequence of the node; Splicing the neighborhood structure sequence with the current round node structure identifier of the node to generate a node combination structure identifier; Comparing the node combination structure identifier with the existing node combination structure identifier item by item, multiplexing the corresponding existing integer number when the node combination structure identifier completely consistent exists, and distributing a new continuous integer number for the node combination structure identifier when the node combination structure identifier completely consistent does not exist to serve as the updated node combination structure identifier; repeatedly executing the topology identification updating operation, and terminating the topology identification updating operation when the node structure identification is consistent in the adjacent two rounds of topology identification updating operation or reaches the set maximum iteration round number; Based on the finally obtained node structure identifiers, carrying out structure grouping on nodes in the payment request evolution graph, merging nodes with the same final node structure identifier into the same structure mode, and distributing a unique structure mode ID for each structure mode; Performing topology compression mapping on the payment request evolution diagram based on the structure mode ID, replacing nodes in the original payment request evolution diagram with corresponding structure mode ID nodes, and reserving an in-edge type and out-edge type connection relation between the structure mode ID nodes to form a structure mode diagram; And taking the structure mode ID and the connection relation in the structure mode diagram as state primitives, constructing a state primitive set, and completing the state space construction of the improved PSR model.
6. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the fifth step is specifically: acquiring a state primitive set, wherein the structure mode ID in the state primitive is a continuous integer number distributed according to the generation sequence in the topology compression mapping process; based on the structure pattern diagram, selecting structure pattern ID nodes with direct directed edge connection relation from the state primitive set for combination to form state primitive combinations, wherein each state primitive combination comprises a center structure pattern ID and adjacent structure pattern IDs directly connected with the center structure pattern ID through directed edges; Numbering the directed edge types in the state primitive combination, and mapping the split evolution edge, the substituted evolution edge and the aggregation evolution edge into preset integer numbers respectively, wherein the split evolution edge number is 1, the substituted evolution edge number is 2, and the aggregation evolution edge number is 3; For each state primitive combination, sequentially reading a central structure mode ID, an adjacent structure mode ID, a corresponding incoming edge type number and a corresponding outgoing edge type number, and sequentially arranging the central structure mode ID, the adjacent structure mode ID, the corresponding incoming edge type number and the corresponding outgoing edge type number into numerical vectors serving as corresponding prediction vectors; and uniformly collecting the prediction vectors generated by the combination of the primitives in different states to form a prediction vector set.
7. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the step six is specifically: extracting a corresponding central structure mode ID when generating the predictive vector for each predictive vector in the predictive vector set; Taking the central structure mode ID as a root node of the scheduling intention tree, and respectively constructing a corresponding scheduling intention tree for each central structure mode ID; Generating sub-nodes of a scheduling intention tree layer by layer according to the arrangement sequence of adjacent structure mode IDs recorded in the prediction vector, wherein each sub-node corresponds to an adjacent structure mode ID directly related to the central structure mode ID in the prediction vector; establishing a directional connection relation between a root node and a child node, wherein the direction of the directional connection relation points to an adjacent structure mode ID from a central structure mode ID, and recording a corresponding incoming edge type number and outgoing edge type number on the directional connection relation; When the same structural mode ID appears as an adjacent structural mode ID in a plurality of predictive vectors, merging the corresponding child nodes under the same father node, and respectively reserving the corresponding directed connection relation; Continuously expanding the child nodes downwards according to the arrangement sequence of the structure mode IDs in the prediction vector until the node mapping of all the structure mode IDs in the prediction vector is completed, forming a directed tree structure which takes the central structure mode ID as a root node and takes the adjacent structure mode ID as a branch node, obtaining a dispatching intention tree, And forming a scheduling intention forest based on a plurality of scheduling intention trees respectively constructed by different prediction vectors in the prediction vector set.
8. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the step seven specifically is: each leaf node in the scheduling intention tree is used as a backtracking starting point, backtracking is conducted to the root node layer by layer along a directional connection relation from the child node to the father node in the scheduling intention tree, and a structure mode ID, a corresponding incoming edge type number and a corresponding outgoing edge type number which pass through in the backtracking process are sequentially recorded until backtracking is conducted to the root node, so that an intention path from the root node to the leaf node is obtained; And executing path pruning operation on a plurality of intention paths obtained by backtracking in the same scheduling intention tree layer by layer according to a path hierarchy sequence, wherein the path pruning operation specifically comprises the following steps of: starting at the layer 1 of the intended path, comparing the structure mode IDs and the edge type numbers corresponding to all the intended paths item by item; When two or more intention paths are completely consistent with the structure mode ID and the edge type number of the current layer, dividing the intention paths into the same candidate path group, and continuously comparing the candidate path groups at the next layer; when the intention paths in the candidate path group are inconsistent for the first time, only the intention path with the shortest path length is reserved, and the rest intention paths are deleted; when the path lengths of two or more intention paths are shortest but the end structure mode IDs are different, carrying out statistical processing on the end structure mode IDs of the intention paths, calculating the occurrence times of each end structure mode ID in a prediction vector set, only reserving the intention path with the largest occurrence times of the end structure mode IDs, and deleting all the rest intention paths; each scheduling intention tree only reserves a unique intention path through the path pruning operation and serves as an optimal intention branch of the corresponding scheduling intention tree; The optimal intention branches reserved in each scheduling intention tree are collected, and the optimal intention branches are mapped into corresponding payment request processing actions in sequence according to the arrangement sequence of the structure mode IDs in each optimal intention branch from the root node to the tail end leaf node, so that a target scheduling action set is formed.
9. The method of dynamic branch send to the compositor path based on reinforcement learning and data mining technology according to claim 1, wherein the step eight is specifically: according to the target scheduling action set and the structure mode ID sequence corresponding to each scheduling action, corresponding payment operations are sequentially executed on the payment requests in the composite payment request set; When the dispatching action corresponds to split processing, respectively initiating a plurality of payment operations according to the sequence of the sub-payment requests, when the dispatching action corresponds to alternative path construction processing, initiating the payment operations according to a payment account or a payment channel of the selected alternative payment request, and when the dispatching action corresponds to aggregation processing, initiating the combined payment operation for the aggregation payment request at one time; and recording the payment execution state and execution time for each payment operation, and generating a dynamic send to the compositor-way result.
10. A dynamic branch send to the compositor-way system based on reinforcement learning and data mining techniques, for performing the dynamic branch send to the compositor-way method based on reinforcement learning and data mining techniques as claimed in any one of claims 1 to 9, comprising the following modules: The original payment request acquisition module is used for acquiring original payment request data, performing field alignment processing on the original payment request data and constructing a structured payment request set; the structural intervention processing module is used for executing splitting processing, freezing processing, alternative path construction processing and aggregation processing on the payment requests in the structural payment request set to generate a composite payment request set; The payment request evolution diagram construction module is used for constructing a payment request evolution diagram by taking the payment requests in the synthesized payment request set as nodes and establishing directed edge connection according to the structural intervention relation; the improved PSR state construction module is used for introducing a topology compression mapping mechanism into the payment request evolution diagram, generating a structural mode ID according to a WEISFEILER-Lehman diagram isomorphic compression method, and constructing a state primitive set based on the structural mode ID; The prediction vector generation module is used for constructing a state primitive combination based on the state primitive set and mapping the state primitive combination into a prediction vector set; The scheduling intention forest construction module is used for constructing a scheduling intention forest composed of a plurality of scheduling intention trees based on the prediction vector set; The scheduling decision module is used for executing strategy backtracking and path pruning operations in the scheduling intention forest, screening the optimal intention branches and determining a target scheduling action set; And the payment execution and result output module is used for executing corresponding payment operation according to the target scheduling action set, recording the payment execution state and execution time and outputting a dynamic send to the compositor-way result.

Description

Dynamic send to the compositor-way supporting method and system based on reinforcement learning and data mining technology Technical Field The invention relates to the technical field of payment management and intelligent scheduling, in particular to a dynamic send to the compositor-way method and a system based on reinforcement learning and data mining technologies. Background With the continuous promotion of enterprise digital transformation and fund centralized management modes, the payment business gradually presents the characteristics of large request quantity, multiple source systems, complex execution constraint, high scheduling timeliness requirement and the like, and an automatic branch send to the compositor process based on an information system gradually replaces a manual auditing and manual scheduling mode, so that the method becomes an important development direction in the field of financial management and payment systems. Existing payment scheduling schemes typically rely on fixed rules, static priorities, or simple time sequences to process payment requests, and some technologies introduce data analysis or reinforcement learning model aided decisions, but in practical applications the following problems are common: The method is characterized in that a payment request originates from a plurality of heterogeneous systems, unified standards are lacking in field structures, data formats and business semantics among different systems, the prior art is mostly processed by adopting a simple mapping or manual configuration mode, stable and consistent structural representation is difficult to form, common conditions of excessive amount, unavailable payment channels, account constraint conflicts, centralized occurrence of multiple equidirectional payments and the like in the payment execution process lead the payment request to be dynamically split, frozen, substituted or combined, the traditional scheduling method lacks modeling capability on evolution relation of the payment request, structural dependence among the requests is difficult to systematically describe, most methods still take a history execution sequence as state input aiming at the existing scheme of introducing reinforcement learning, the topological structure characteristics among the payment requests are ignored, the state space is rapidly expanded when the payment scale is enlarged or the structure of the request is frequently changed, the learning efficiency is reduced, the decision result is unstable, and the dynamic scheduling requirement under a complex payment scene is difficult to be met. Therefore, how to provide a dynamic branch send to the compositor path method and system based on reinforcement learning and data mining technology is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide a dynamic branch send to the compositor process method and a system based on reinforcement learning and data mining technology, which combine an improved PSR model with a scheduling intention forest by introducing a topology compression mapping mechanism, execute strategy backtracking and path pruning operation in the scheduling intention forest, realize the dynamic branch send to the compositor process of a complex payment request structure, control the state space scale, and simultaneously improve the stability and the interpretability of scheduling decisions, and obviously improve the payment execution efficiency and the overall intelligent level of the system. According to the embodiment of the invention, the dynamic branch send to the compositor path method based on reinforcement learning and data mining technology comprises the following steps: Collecting original payment request data, and constructing a structured payment request set; step two, executing structural intervention operation on the structured payment request set, wherein the structural intervention operation comprises splitting, freezing, constructing an alternative path and aggregating the payment request to generate a synthetic payment request set; thirdly, taking the payment requests in the synthesized payment request set as nodes, and establishing directed edge connection to construct a payment request evolution diagram; Step four, an improved PSR model is built, the improved PSR model introduces a topology compression mapping mechanism, a WEISFEILER-Lehman graph isomorphic compression method is adopted to compress and map the payment request evolution graph into a group of structure mode IDs, and a state primitive set is built; constructing a state primitive combination based on the state primitive set, mapping the state primitive combination, and generating a prediction vector set; Step six, constructing a scheduling intention forest based on the prediction vector set; executing strategy backtracking and path pruning operations in the scheduling intention forest, screening an optimal intention branch,