CN-121980459-A - Big data processing method, device and management system

CN121980459ACN 121980459 ACN121980459 ACN 121980459ACN-121980459-A

Abstract

The invention provides a big data processing method, a big data processing device and a big data processing management system, which are characterized in that through acquiring associated data items with timestamp marks from a plurality of different data sources, entity associated network construction is carried out on the associated data items based on unique identification information of entity objects, a heterogeneous associated network comprising multi-type entity nodes and attribute associated edges is generated, multi-path parallel traversal analysis is carried out on the heterogeneous associated network, abnormal associated path characteristics existing in the network are extracted, the abnormal associated path characteristics are input into a preset depth characteristic fusion model, cross-dimension characteristic recombination and nonlinear mapping processing are carried out, an abnormal evaluation characteristic vector is generated, and a comprehensive evaluation result comprising an abnormal degree quantized value and an abnormal type identifier is output according to a matching analysis result of the abnormal evaluation characteristic vector and a preset reference characteristic template set. The invention can improve the practicability and pertinence of the big data association analysis processing.

Inventors

Cheng Yingang
ZHU JIXI
ZHU LIN

Assignees

贵州引擎科技产业有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. A method of big data processing, the method comprising: Acquiring associated data entries with timestamp marks from a plurality of different data sources, wherein the associated data entries comprise interaction records and attribute description information among different entity objects; Performing entity association network construction on the association data item based on the unique identification information of the entity object to generate a heterogeneous association network comprising multiple types of entity nodes and attributed association edges, wherein the attributed association edges comprise interaction time characteristics and interaction strength characteristics; Carrying out multipath parallel traversal analysis on the heterogeneous associated network, and extracting abnormal associated path characteristics in the network, wherein the abnormal associated path characteristics comprise path length characteristics, node type sequence characteristics and edge attribute change characteristics; Inputting the abnormal associated path characteristics into a preset depth characteristic fusion model, performing cross-dimensional characteristic recombination and nonlinear mapping processing, and generating an abnormal evaluation characteristic vector containing multidimensional abnormal evaluation parameters; And outputting a comprehensive evaluation result containing an abnormality degree quantized value and an abnormality type identifier according to a matching analysis result of the abnormality evaluation feature vector and a preset reference feature template set.
2. The big data processing method according to claim 1, wherein the performing entity association network construction on the association data item based on the unique identification information of the entity object, generating a heterogeneous association network including multiple types of entity nodes and attributed association edges, includes: extracting a time sequence of entity interaction from the associated data items, classifying interaction records with continuous time intervals into the same cluster, and outputting time cluster associated data item groups, wherein each time cluster associated data item group comprises entity interaction records with similar time and corresponding attribute description information; Binding entity attribute description information in each time cluster associated data item group with a corresponding time cluster label, attaching attribute information with a time cluster stamp, and outputting an attribute-time cluster binding set, wherein each set item comprises an entity attribute and the corresponding time cluster label; extracting inter-entity interaction records from the time cluster associated data item group, converting the inter-entity interaction records into temporary edges connected with entity nodes, adding corresponding time cluster labels for the temporary edges, and outputting a temporary edge set with the time labels, wherein each temporary edge is associated with two entity nodes and the time cluster labels; The temporary edges with time labels, which belong to the same entity node, are connected in series according to the sequence of the time clusters, the temporary edges of different time clusters are connected in series according to the time sequence, and a time-sequence entity connection sequence set is output, wherein each sequence comprises a cross-time-cluster connection path of the entity node; Cross-cluster cross-correlation is carried out on the time-sequence entity connection sequences, connection sequences related to the same entity in different time clusters are combined, the entity connection relation of the cross-time clusters is reserved, a cross-cluster combined connection sequence set is output, and each sequence contains the cross-cluster entity connection relation; And converting the cross-cluster merging connection sequence into a structured network, taking the entity as a node, connecting the cross-cluster merging connection as an edge, adding a time cluster and an interactive characteristic attribute for the edge, and outputting a heterogeneous associated network comprising multiple types of entity nodes and attributed associated edges.
3. The big data processing method according to claim 2, wherein the converting the cross-cluster merge connection sequence into a structured network, taking an entity as a node, cross-cluster merge connection as an edge, adding a time cluster and an interaction feature attribute for the edge, and outputting a heterogeneous association network including multiple types of entity nodes and attribute association edges, includes: extracting cross-cluster connection frequency of each entity node from the cross-cluster combined connection sequence set, binding the cross-cluster connection frequency with the corresponding entity node, and outputting a node-cross-cluster connection frequency binding set, wherein each set entry comprises an entity node identifier and the corresponding cross-cluster connection frequency; Dividing entity nodes according to time spans connected across clusters, collecting entity nodes with similar time spans into the same group, and outputting a time span node group set, wherein each group comprises entity nodes with similar time spans and corresponding attribute information; Filtering the cross-cluster merging connection sequence of each time span node group, reserving cross-cluster connection of which the connection interval accords with the time span range, and outputting a time sequence filtering cross-cluster connection sequence set, wherein each sequence comprises the cross-cluster connection which accords with the time span; Associating the time sequence filtering cross-cluster connection sequence with a corresponding time span node group, taking an entity in the node group as a node, connecting the time sequence filtering as an edge, generating a dynamic sub-network, and outputting a dynamic sub-network set, wherein each sub-network comprises entity nodes and connection edges with the same time span; carrying out time-space alignment on the subnetworks in the dynamic subnetwork set, uniformly mapping time cluster labels of different subnetworks into continuous time shafts, aligning time dimensions of the subnetworks, outputting a time-space pair Ji Dongtai subnetwork set, and uniformly mapping time shafts of each subnetwork; integrating the Ji Dongtai sub-networks in time-space, reserving the entity connection relation between the sub-networks, adding time clusters and interactive characteristic attributes for edges, and outputting heterogeneous associated networks comprising multi-type entity nodes and attribute associated edges.
4. The big data processing method of claim 3, wherein integrating the spatio-temporal pairs Ji Dongtai of sub-networks, preserving the physical connection relationship between the sub-networks, adding time clusters and interactive feature attributes for edges, and outputting heterogeneous associated networks including multi-type entity nodes and attributed associated edges, comprises: extracting all entity pairs which are not directly connected but share at least three adjacent entity nodes from the space-time pair Ji Dongtai sub-network set, taking the entity pairs as objects with potential connection, and outputting a potential association entity pair set, wherein each entry comprises two entities which are not directly connected but share the adjacent nodes; Binding potential association entity pairs and common adjacent entity nodes of the potential association entity pairs, taking the common adjacent nodes as supporting basis of potential association, and outputting a potential association-common adjacent node binding set, wherein each item comprises the potential association entity pairs and corresponding common adjacent nodes; Generating a time constraint condition of potential association according to the interaction time sequence of the common adjacent node and the potential association entity pair, enabling the potential association to accord with interaction time sequence logic, and outputting a potential association time constraint set, wherein each item comprises the potential association and the corresponding time constraint; Based on the potential association time constraint condition, reserving potential association entity pairs conforming to the time constraint, and outputting a verified potential association entity pair set, wherein each entry contains reasonable potential association entity pairs; converting the verified potential association entity pair into a hidden connection edge, adding a common adjacent node support and time constraint stamp for the hidden edge, adding the hidden edge into a time-space pair Ji Dongtai sub-network, and outputting a heterogeneous association network with the hidden edge, wherein the heterogeneous association network comprises a dominant edge and a hidden edge; And redundant cleaning is carried out on the heterogeneous associated network with the hidden edges, repeated hidden edges and hidden edges which are completely overlapped with the dominant edges are removed, the hidden edges with unique connection significance are reserved, and the expanded heterogeneous associated network is output and comprises multiple types of entity nodes, the dominant edges and the hidden edges.
5. The big data processing method according to claim 1, wherein the performing multi-path parallel traversal analysis on the heterogeneous associated network, extracting abnormal associated path features existing in the network, includes: selecting entity nodes related to interaction of at least three different time clusters from the heterogeneous associated network as traversal starting nodes, and outputting a cross-cluster starting node set, wherein each node comprises interaction records and attribute information of a cross-multi-time cluster; traversing along paths with continuously changed edge attributes by taking each cross-cluster starting node as a starting point, tracking complete paths from an initial state to a changed state of the edge attributes, generating attribute evolution traversing paths, and outputting attribute evolution traversing path sets, wherein each path comprises a node sequence and a corresponding edge attribute changing track; extracting the total number of entity nodes contained in the path from each attribute evolution traversing path as path length content, extracting the category identification of each entity node in the path as node type sequence content, extracting the continuous change track of the edge attribute in the path as edge attribute change content, and outputting a path basic feature set; comparing each path characteristic in the path basic characteristic set with path characteristics generated by all initial nodes of the same type, identifying content with deviation from a plurality of path characteristics, outputting a path characteristic deviation comparison result set, and each item comprises the path characteristics and corresponding deviation content; Screening paths with deviation content covering three types of characteristics including path length, node type sequence and edge attribute change from a path characteristic deviation comparison result set, and outputting a differential evolution path set as differential paths, wherein each path contains three types of characteristic deviation content; Extracting path length content, node type sequence content and edge attribute change content from the differential evolution path set, integrating the three types of features into a unified format, and outputting abnormal associated path features existing in a network, wherein the abnormal associated path features comprise three types of complete feature content.
6. The big data processing method of claim 5, wherein the extracting path length content, node type sequence content and edge attribute change content from the differential evolution path set integrates the three types of features into a unified format, outputs abnormal associated path features existing in the network, includes three types of complete feature content, and includes: extracting all adjacent nodes of each entity node in each path from the differential evolution path set, binding the adjacent nodes with the corresponding paths, outputting a path-adjacent node binding set, and each entry containing a path identifier and all adjacent nodes of the corresponding nodes; associating the node type sequence of the path with the adjacent node category of the corresponding node, adding the adjacent node category of each node into the node type sequence to generate an extended node type sequence, and outputting an extended node type sequence set, wherein each sequence comprises an original node type and an adjacent node category; extracting the occurrence frequency of node categories from the extended node type sequence set, binding the category frequency with the corresponding paths, and outputting a path-category frequency binding set, wherein each item comprises a path identifier and the corresponding node category frequency; associating the frequency content in the path-class frequency binding set with the path length content and the edge attribute change content, associating and binding the three types of features to generate extended abnormal association path features, and outputting an extended abnormal association path feature set, wherein each feature comprises length, type frequency and attribute change; content integration is carried out on the extended abnormal associated path feature set, three types of features are arranged according to a fixed sequence, the structure consistency of each feature is ensured, the extended abnormal path feature set with uniform structure is output, and each feature structure is consistent; And sorting the expanded abnormal path features with uniform structures, removing repeated feature content, reserving unique abnormal associated path features, and outputting optimized abnormal associated path features, wherein the optimized abnormal associated path features comprise complete path length, node type sequences and edge attribute change features.
7. The big data processing method of claim 6, wherein the sorting the extended abnormal path features with uniform structure, removing repeated feature content, retaining unique abnormal associated path features, outputting optimized abnormal associated path features including complete path length, node type sequence and edge attribute change features, comprises: extracting edge attribute change tracks of each path from the extended abnormal path feature set with uniform structure, binding the edge attribute change tracks with corresponding paths, outputting a path-attribute change track binding set, and each item comprises a path identifier and a corresponding edge attribute change track; Correlating the path edge attribute change track with the path length and the extended node type sequence, matching the track change period with the path length and the node type, generating a time sequence linkage abnormality correlation path feature, and outputting a time sequence linkage abnormality correlation path feature set, wherein each feature comprises three types of time sequence linkage contents; Extracting the fluctuation range of the change track from the time sequence linkage abnormality association path feature set, binding the fluctuation range with the corresponding path, outputting a path-fluctuation range binding set, and each item comprises a path identifier and a corresponding attribute change fluctuation range; Correlating the fluctuation range content in the path-fluctuation range binding set with the path length and the extended node type sequence content to generate fluctuation-associated abnormal path characteristics, and outputting a fluctuation-associated abnormal path characteristic set, wherein each characteristic comprises a fluctuation range, a length and a type sequence; removing the duplication of the fluctuation-associated abnormal path feature set, removing the same fluctuation-associated features, and outputting the duplication-removed fluctuation-associated abnormal path feature set; And sorting the de-duplicated fluctuation-associated abnormal path characteristics, reserving the complete contents of path length, node type sequence and edge attribute change, and outputting optimized abnormal-associated path characteristics, wherein the optimized abnormal-associated path characteristics comprise three types of complete characteristic contents.
8. The big data processing method according to claim 1, wherein the inputting the abnormality related path feature into a preset depth feature fusion model, performing cross-dimensional feature recombination and nonlinear mapping processing, and generating an abnormality evaluation feature vector including a multidimensional abnormality evaluation parameter, includes: Arranging the path length content, the node type sequence content and the edge attribute change content of the abnormal associated path features according to the time sequence of the path, connecting the three types of features in series according to the time sequence, and outputting an abnormal associated path feature set which is ordered in time, wherein each item comprises the three types of features according to the time sequence; Staggering and splicing different path features in the abnormal associated path feature set in time sequence, splicing the time sequence segments of each path with the time sequence segments of other paths to generate staggered and spliced features, and outputting a staggered and spliced feature set, wherein each feature comprises the time sequence segments of different paths; Inputting the staggered spliced features into a preset depth feature fusion model, inputting the staggered spliced features into a feature processing link, generating initial fusion features, and outputting an initial fusion feature set, wherein each feature comprises multi-path staggered fusion content; Performing cross-time sequence association on the features in the initial fusion feature set, performing association binding on feature contents of different time sequence segments, mining cross-time sequence feature association relation, and outputting a cross-time sequence association fusion feature set, wherein each feature comprises cross-time sequence association content; performing nonlinear mapping on cross-time sequence associated fusion features, converting the associated fusion features into a high-dimensional vector form, generating intermediate abnormality evaluation feature vectors, and outputting an intermediate abnormality evaluation feature vector set, wherein each vector contains multi-dimensional feature content; and integrating the intermediate abnormal evaluation feature vector sets, uniformly sorting the multidimensional features of different vectors, and outputting the abnormal evaluation feature vectors containing the multidimensional abnormal evaluation parameters.
9. A big data processing apparatus, comprising: the data acquisition module is used for acquiring associated data entries with timestamp marks from a plurality of different data sources, wherein the associated data entries comprise interaction records and attribute description information among different entity objects; The network construction module is used for constructing an entity association network for the association data item based on the unique identification information of the entity object, and generating a heterogeneous association network comprising multiple types of entity nodes and attributed association edges, wherein the attributed association edges comprise interaction time characteristics and interaction strength characteristics; the characteristic extraction module is used for carrying out multipath parallel traversal analysis on the heterogeneous associated network and extracting abnormal associated path characteristics in the network, wherein the abnormal associated path characteristics comprise path length characteristics, node type sequence characteristics and edge attribute change characteristics; The recombination mapping module is used for inputting the abnormal associated path characteristics into a preset depth characteristic fusion model, performing cross-dimensional characteristic recombination and nonlinear mapping processing, and generating an abnormal evaluation characteristic vector containing multidimensional abnormal evaluation parameters; And the matching analysis module is used for outputting a comprehensive evaluation result comprising an abnormality degree quantized value and an abnormality type identifier according to the matching analysis result of the abnormality evaluation feature vector and a preset reference feature template set.
10. A management system, comprising: a memory for storing computer executable instructions or computer programs; A processor for implementing the big data processing method according to any of claims 1 to 9 when executing computer executable instructions or computer programs stored in said memory.

Description

Big data processing method, device and management system Technical Field The present application relates to the field of data processing, and in particular, to a method, an apparatus, and a management system for processing big data. Background With the advancement of the digital transformation of the second-hand vehicle industry, scattered transaction, consultation and vehicle state association data in a second-hand vehicle platform are processed, association logic among different entity objects can be mined, and an evaluation result capable of reflecting abnormal conditions of transaction association is obtained. At present, platform association data to be analyzed is generally obtained, surface layer contents of the association data are directly carded, and association analysis results are directly output after preliminary classification and collection are completed according to transaction time or vehicle attribution. However, such analysis methods only pay attention to the association expression of the data surface layer, and cannot carry out structural carding on interaction logics of buyers and sellers, intermediaries and vehicle resources in a platform, so that the analysis results cannot accurately reflect deep association change conditions among different entities, abnormal association contents hidden in conventional transaction interaction logics are difficult to effectively identify, reliable basis cannot be provided for platform risk management and control and compliance supervision, and based on the analysis methods, how to more accurately carry out association analysis of large data of a second cart and excavate deep association changes among platform entities becomes a research hotspot in the current digital operation field of the second cart. Disclosure of Invention The invention provides a big data processing method, a big data processing device and a big data management system. In a first aspect, an embodiment of the present invention provides a big data processing method, where the method includes: The method comprises the steps of obtaining associated data items with timestamp marks from a plurality of different data sources, wherein the associated data items comprise interaction records and attribute description information among different entity objects, carrying out entity associated network construction on the associated data items based on unique identification information of the entity objects to generate a heterogeneous associated network comprising multi-type entity nodes and attributed associated edges, wherein the attributed associated edges comprise interaction time features and interaction strength features, carrying out multipath parallel traversal analysis on the heterogeneous associated network, extracting abnormal associated path features in the network, wherein the abnormal associated path features comprise path length features, node type sequence features and edge attribute change features, inputting the abnormal associated path features into a preset depth feature fusion model, carrying out cross-dimensional feature recombination and nonlinear mapping processing to generate an abnormal evaluation feature vector comprising multidimensional abnormal evaluation parameters, and outputting a comprehensive evaluation result comprising an abnormal degree quantization value and an abnormal type identifier according to a matching analysis result of the abnormal evaluation feature vector and a preset reference feature template set. In a second aspect, the embodiment of the invention provides a big data processing device, which comprises a data acquisition module, a network construction module, a characteristic extraction module, a recombination mapping module and a matching analysis module, wherein the data acquisition module is used for acquiring associated data items with timestamp marks from a plurality of different data sources, the associated data items comprise interaction records and attribute description information among different entity objects, the network construction module is used for carrying out entity associated network construction on the associated data items based on unique identification information of the entity objects to generate a heterogeneous associated network comprising multi-type entity nodes and attributed associated edges, the attributed associated edges comprise interaction time characteristics and interaction strength characteristics, the characteristic extraction module is used for carrying out multipath parallel traversal analysis on the heterogeneous associated network to extract abnormal associated path characteristics existing in the network, the abnormal associated path characteristics comprise path length characteristics, node type sequence characteristics and edge attribute change characteristics, the recombination mapping module is used for inputting the abnormal associated path characteristics into a preset depth characteristic fusion