CN-121980371-A - Transaction data automatic labeling method and system based on multi-modal learning

CN121980371ACN 121980371 ACN121980371 ACN 121980371ACN-121980371-A

Abstract

The invention discloses a transaction data automatic labeling method and system based on multi-modal learning, and relates to the technical field of data processing. The method comprises the steps of obtaining target transaction data comprising transaction text data and transaction behavior time sequence data, respectively carrying out vectorization mapping on the target transaction data to obtain a first modal feature vector and a second modal feature vector, inputting the two sets of feature vectors into a multi-modal labeling model, generating an initial prediction label through feature fusion, calculating contribution degree weights of the modal feature vectors to determine a dominant feature vector, generating a feature suppression mask, carrying out shielding processing on the dominant feature vector to obtain a disturbance feature vector, inputting the disturbance feature vector and other unshielded modal feature vectors into the model again to carry out secondary reasoning to obtain a verification prediction label, calculating semantic offset between the initial prediction label and the verification prediction label, calibrating the confidence degree accordingly, and outputting a final labeling result. The method and the device improve the robustness and accuracy of transaction data annotation.

Inventors

ZHANG FANG
ZHU JIANQUN
ZHU HONGQIANG

Assignees

济南职业学院

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. A transaction data automated labeling method based on multi-modal learning, the method comprising the steps of: acquiring target transaction data to be marked, wherein the target transaction data at least comprises transaction text data and transaction behavior time sequence data; Vectorizing mapping is carried out on the transaction text data and the transaction behavior time sequence data respectively to obtain a first modal feature vector and a second modal feature vector; inputting the first modal feature vector and the second modal feature vector into a pre-trained multi-modal labeling model, generating a comprehensive feature vector through a feature fusion layer, and outputting an initial prediction label based on the comprehensive feature vector; characterized in that the method further comprises the steps of: calculating contribution degree weights of the first modal feature vector and the second modal feature vector in the process of generating the initial predictive label, and determining the modal feature vector with the highest contribution degree weight as a dominant feature vector; Generating a feature suppression mask for the dominant feature vector, and executing shielding processing on the dominant feature vector by using the feature suppression mask to obtain a disturbance feature vector; Inputting the disturbance feature vector and other unshielded modal feature vectors into the multi-modal labeling model again, and performing secondary reasoning to obtain a verification prediction label; Calculating the semantic offset between the initial prediction tag and the verification prediction tag, calibrating the confidence coefficient of the initial prediction tag according to the semantic offset, and outputting a final labeling result of the target transaction data based on the calibrated confidence coefficient.
2. The method according to claim 1, wherein the step of vectorizing the transaction text data and the transaction behavior time series data respectively comprises: processing the transaction text data by using a preset semantic analyzer, extracting a keyword set, mapping the keyword set into dense semantic vectors through a word embedding layer of a pre-training language model, and simultaneously calculating high-dimensional sparse feature vectors of the keyword set; constructing a double-channel gating projection network, calculating a fusion gating coefficient, and carrying out weighted fusion on the dense semantic vector and the high-dimensional sparse feature vector by using the gating coefficient to obtain the first modal feature vector; and constructing a time sliding window based on the transaction behavior time sequence data, extracting transaction frequency, transaction amount fluctuation and transaction time interval characteristics in the sliding window, constructing a time sequence behavior sequence, inputting the time sequence behavior sequence into a deep cyclic neural network for encoding, and extracting a hidden state vector as the second modal feature vector.
3. The method according to claim 1, wherein the step of generating the integrated feature vector by the feature fusion layer specifically comprises: constructing a cross feature fusion layer containing a multi-head attention mechanism, mapping the first modal feature vector into a query vector, and mapping the second modal feature vector into a key vector and a value vector; Calculating the dot product of the query vector and the key vector, normalizing the dot product result by a scaling factor, and generating an attention score matrix through a Softmax function; Performing weighted aggregation on the value vectors based on the attention score matrix to generate cross-modal context vectors, wherein the cross-modal context vectors characterize transaction behavior characteristics under semantic guidance of the transaction text data; And splicing or adding the cross-modal context vector and the first modal feature vector by using a residual error connection structure to obtain the comprehensive feature vector.
4. The method according to claim 1, wherein the step of calculating the contribution weights of the first modality feature vector and the second modality feature vector in the process of generating the initial predictive label, specifically comprises: acquiring an attention distribution coefficient of a feature fusion layer in the multi-mode labeling model; Obtaining attention distribution coefficients corresponding to the first modal feature vector and the second modal feature vector, and calculating the average value or the maximum value of each modal coefficient to obtain unit density attention values of the two modalities; And carrying out normalization processing on the attention degree value, and determining the normalized value as a contribution degree weight of the corresponding modal feature vector.
5. The method of claim 1, further comprising pre-constructing a transaction knowledge graph, wherein the transaction knowledge graph comprises historical transaction nodes and associated standard class nodes; After the first modal feature vector is obtained, calculating the vector similarity between the first modal feature vector and each node in the transaction knowledge graph, and fusing the embedded vectors of Top-K nodes with the highest similarity into the first modal feature vector.
6. The method according to claim 1, characterized in that said step of generating a feature suppression mask for said dominant feature vector, in particular comprises: identifying key feature dimensions of which the activation values exceed a preset activation threshold in the dominant feature vector; And constructing a binary vector with the same dimension as the dominant feature vector, setting the position corresponding to the key feature dimension as a zero value, and setting the rest positions as non-zero values to obtain the feature suppression mask.
7. The method according to claim 1, characterized in that said step of calculating a semantic offset between said initial predictive label and said validated predictive label comprises in particular: Acquiring a first probability distribution vector corresponding to the initial predictive label and a second probability distribution vector corresponding to the verification predictive label; Determining relative entropy between the first probability distribution vector and the second probability distribution vector by utilizing KL divergence calculation logic, and taking the relative entropy as the semantic offset.
8. The method of claim 1, wherein the step of calibrating the confidence level of the initial predictive tag according to the semantic offset and outputting a final labeling result of the target transaction data based on the calibrated confidence level comprises: If the semantic offset is smaller than a preset robustness threshold, judging that the dominant feature vector is reliable, and determining the initial prediction label as the final labeling result; If the semantic offset is greater than or equal to the preset robustness threshold, judging that the dominant feature vector has a fitting risk, reducing the confidence coefficient of the initial prediction label, and marking the target transaction data as a state to be checked manually.
9. The method of claim 8, wherein the method further comprises: collecting target transaction data with the semantic offset being greater than or equal to the preset robustness threshold value, and constructing an antagonistic difficulty sample set; and performing fine tuning training on the multi-modal labeling model by using the resistant refractory sample set so as to update network parameters of the multi-modal labeling model.
10. A multi-modal learning-based transaction data automated labeling system, the system comprising: The data acquisition module is used for acquiring target transaction data to be marked, wherein the target transaction data at least comprises transaction text data and transaction behavior time sequence data; The feature mapping module is used for vectorizing mapping the transaction text data and the transaction behavior time sequence data respectively, and is specifically configured to generate and fuse a dense semantic vector and a sparse feature vector of a text by utilizing a dual-channel network to obtain a first modal feature vector, and encode a behavior sequence by utilizing a time sequence encoding network to obtain a second modal feature vector; The multi-modal labeling model is configured with a cross attention feature fusion layer, which is used for carrying out attention weighting on the second modal feature vector serving as a key value pair by taking the first modal feature vector as a query vector to generate a comprehensive feature vector and outputting an initial prediction label based on the comprehensive feature vector; the sensitivity analysis module is used for calculating the contribution degree weight of the first modal feature vector and the second modal feature vector in the process of generating the initial predictive label, and determining the modal feature vector with the highest contribution degree weight as a dominant feature vector; The disturbance verification module is used for generating a feature suppression mask for the dominant feature vector, performing shielding processing on the dominant feature vector by using the feature suppression mask to obtain a disturbance feature vector, inputting the disturbance feature vector and other non-shielded modal feature vectors into the reasoning calculation module again, and performing secondary reasoning to obtain a verification prediction tag; the consistency decision module is used for calculating the semantic offset between the initial prediction tag and the verification prediction tag, calibrating the confidence coefficient of the initial prediction tag according to the semantic offset, and outputting a final labeling result of the target transaction data based on the calibrated confidence coefficient.

Description

Transaction data automatic labeling method and system based on multi-modal learning Technical Field The invention relates to the technical field of data processing, in particular to a transaction data automatic labeling method and system based on multi-modal learning. Background Along with the rapid development of financial science and technology and the increasing enrichment of electronic payment scenes, commercial banks and third party payment institutions face the processing pressure of massive transaction data every day, and the accurate business attribute classification and risk tag labeling of the data are basic works of constructing customer panoramic views, implementing money laundering monitoring and developing intelligent marketing. In the prior art system, transaction data annotation is mainly processed by relying on manual rules or a single-dimension machine learning model. One common path is to classify by keyword matching or natural language processing technology based on unstructured text information such as transaction abstract and appendices, which is effective when facing to standardized business description, but is extremely susceptible to the effects of text entry non-standardization, numerous industry abbreviation variants and homophones, and difficult to identify disguising behavior of text content inconsistent with actual fund usage. Another technical path is to focus on counting the numerical timing characteristics of transaction amount, frequency, time interval, etc., and infer transaction properties by establishing a behavior rule or a statistical model, and although this approach can capture the rule of funds circulation, due to lack of assistance of semantic information, it is often difficult to distinguish transaction types with similar behavior patterns but distinct business essence. For example, normal periodic payment and split money laundering actions may exhibit high similarities in statistical characteristics, resulting in a high false positive rate. In addition, under a complex financial wind control countermeasure environment, the situation that a transaction background is intentionally forged for avoiding supervision often exists, and the complementarity of semantic understanding and behavior pattern recognition is difficult to be considered by the traditional single-mode analysis method. More importantly, the existing deep learning annotation model usually adopts a single forward reasoning mode and lacks a self-verification mechanism for self-decision logic. Models tend to produce path dependencies on some strong feature during training, such as ignoring anomalies in behavior patterns once a particular high frequency word is detected, or just paying attention to large funds flow and ignoring the rationality of the text context. The decision mode lacking in interpretability and robustness verification is easy to output an error result with high confidence when facing to a difficult sample with antagonistic characteristics, and is difficult to meet the specific requirements of financial core business on data processing accuracy and safety. Disclosure of Invention The invention aims to provide a transaction data automatic labeling method and system based on multi-modal learning, which are used for solving the problems pointed out in the background technology. In a first aspect, the present invention provides a transaction data automatic labeling method based on multi-modal learning, the method comprising the steps of: acquiring target transaction data to be marked, wherein the target transaction data at least comprises transaction text data and transaction behavior time sequence data; Vectorizing mapping is carried out on the transaction text data and the transaction behavior time sequence data respectively to obtain a first modal feature vector and a second modal feature vector; inputting the first modal feature vector and the second modal feature vector into a pre-trained multi-modal labeling model, generating a comprehensive feature vector through a feature fusion layer, and outputting an initial prediction label based on the comprehensive feature vector; characterized in that the method further comprises the steps of: calculating contribution degree weights of the first modal feature vector and the second modal feature vector in the process of generating the initial predictive label, and determining the modal feature vector with the highest contribution degree weight as a dominant feature vector; Generating a feature suppression mask for the dominant feature vector, and executing shielding processing on the dominant feature vector by using the feature suppression mask to obtain a disturbance feature vector; Inputting the disturbance feature vector and other unshielded modal feature vectors into the multi-modal labeling model again, and performing secondary reasoning to obtain a verification prediction label; Calculating the semantic offset between the initial predicti