Search

CN-121326641-B - Distributed access-based business process data analysis system and method

CN121326641BCN 121326641 BCN121326641 BCN 121326641BCN-121326641-B

Abstract

The invention discloses a business process data analysis system and a business process data analysis method based on distributed access, which relate to the technical field of big data analysis and determine characteristic business nodes by analyzing historical abnormal records; setting training period, constructing characteristic index library and association library, collecting abnormal record, calculating service score, drawing graph, predicting abnormal time, collecting real-time service flow, calculating real-time service score, predicting abnormal time, reducing invalid analysis range, raising abnormal detection efficiency and accuracy, obviously raising abnormal diagnosis depth and accuracy, effectively reducing manual monitoring workload and raising operation and maintenance efficiency.

Inventors

  • ZHU CE
  • HE TINGTING
  • Xiao Chuqiao
  • Chen Beiqi
  • WANG SHUN
  • YIN HAOQIANG
  • YANG BING

Assignees

  • 上海市大数据中心

Dates

Publication Date
20260505
Application Date
20251215

Claims (9)

  1. 1. A business process data analysis method based on distributed access is characterized in that the method comprises the following steps: step 100, analyzing the history abnormal record based on the history execution record of the business process platform, counting the occurrence frequency of abnormal business nodes, and determining a characteristic business node set of each business process; Step 200, setting a training period, screening out characteristic indexes of each service node by analyzing an execution record, constructing a characteristic index library, analyzing the similarity degree between service nodes based on the characteristic indexes of each service node, determining the relevance between each service node and establishing a characteristic relevance library; Step 300, acquiring an abnormal record in a training period, acquiring characteristic indexes corresponding to abnormal service nodes in an analysis period based on a characteristic index library, calculating to obtain service scores of the abnormal service nodes, acquiring service scores of associated service nodes corresponding to the abnormal service nodes based on a characteristic association library, carrying out weighted calculation on the service scores of the abnormal service nodes and the service scores of the associated service nodes corresponding to the abnormal service nodes, calculating to obtain characteristic scores, summarizing the characteristic scores in each analysis period, drawing a graph, extracting image characteristics of the graph, and establishing an abnormal time prediction model by combining time lengths; step 400, collecting a real-time business process, determining characteristic business nodes corresponding to the real-time business process, calculating real-time business scores of each characteristic node, calculating real-time prediction abnormal time based on an abnormal time prediction model, and determining whether to check according to the real-time prediction abnormal time; The step S400 includes the following steps: Step S401, collecting real-time business processes, determining characteristic business nodes corresponding to the business processes, collecting characteristic indexes corresponding to each characteristic node based on a characteristic index library, and calculating to obtain real-time business scores of each characteristic node according to numerical distribution characteristics of the characteristic indexes; Step S402, based on a feature association library, obtaining real-time service scores of associated service nodes corresponding to each feature node, calculating to obtain real-time feature scores of the feature nodes, drawing a real-time graph according to the real-time feature scores of each analysis period, extracting real-time image features of the real-time graph, inputting the real-time image features into an abnormal time prediction model, calculating to obtain real-time prediction abnormal time of each feature node, reminding a worker to check the feature nodes if the real-time prediction abnormal time of a certain feature node is smaller than or equal to the analysis period duration, and continuously executing step S401 if the real-time prediction abnormal time of all feature nodes is larger than the analysis period duration.
  2. 2. The method for analyzing business process data based on distributed access according to claim 1, wherein said step S100 comprises the steps of: Step S101, in a business process platform, identifying an accessed business process, presetting a business process library, acquiring business nodes corresponding to each business process, carrying out node coding on each business node, setting a preset time period as an analysis period, setting the starting time of the analysis period as an initial time point, setting the ending time of the analysis period as a final time point, acquiring a business process received by the business process platform, determining a business node set corresponding to the business process, acquiring operation behavior data of the business node set in each process analysis period during execution of the business process, extracting boundary data of the operation behavior data at the initial time point and the final time point respectively, summarizing the boundary data and the operation behavior data to obtain analysis data, integrating the analysis data of each process analysis period, generating an execution record and uploading the execution record to the business process platform; step S102, acquiring a history execution record, acquiring a judging result of each analysis period in the history execution record, setting the analysis period with the judging result of abnormality as an abnormal period, marking the history execution record as a history abnormal record if the abnormal period exists in the history execution record, extracting a business process corresponding to each history abnormal record, and classifying the history abnormal record according to the business processes; Step 103, in a history abnormal record set of a certain business process, collecting abnormal business nodes corresponding to abnormal periods in a certain history abnormal record, counting the occurrence times of each abnormal business node in the history abnormal record set, calculating to obtain the occurrence frequency of each abnormal business node, presetting an occurrence frequency threshold, setting the abnormal business node exceeding the occurrence frequency threshold as a characteristic business node of the business process, and summarizing to generate a characteristic business node set of each business process.
  3. 3. The method for analyzing business process data based on distributed access according to claim 2, wherein said step S200 comprises the steps of: Step S201, selecting continuous days as a training period, summarizing the execution record of each business process in the training period, and collecting analysis data of a certain execution record in each analysis period in the execution record set of a certain business process; Step S202, according to the analysis data of each analysis period, the operation behavior data of each service node is evaluated and the characteristics are extracted, the characteristic indexes are screened out from the candidate indexes, and a characteristic index library of each service node is established; And step 203, analyzing the relation among the service nodes according to the characteristic indexes of the service nodes, and establishing a characteristic association library of the service nodes.
  4. 4. A distributed access-based business process data analysis method according to claim 3, wherein the step S202 of establishing a feature index library of each business node comprises the steps of: Step S202-1, by counting the execution condition of the service node in each analysis period, extracting the original indexes of the operation characteristics of the service node and summarizing the original indexes as candidate index sets, wherein the original indexes comprise time indexes, load indexes, state indexes, path indexes and scheduling indexes; Step S202-2, collecting the numerical value of each original index in a candidate index set, calculating to obtain numerical distribution characteristics in each analysis period through statistical characteristic analysis, wherein the numerical distribution characteristics comprise the mean value, variance, dispersion, variation trend, fluctuation amplitude, extremum proportion and inter-period stability of the original index in the analysis period, carrying out normalization calculation on the numerical distribution characteristics, presetting the weight of each numerical distribution characteristic, carrying out weighted summation on the numerical distribution characteristics after normalization, calculating to obtain the performance score of the original index, presetting a performance score threshold, setting the original index lower than the performance score threshold as the fluctuation index, and carrying out assignment on the fluctuation index; Step S202-3, acquiring a judging result of each analysis period, assigning a value to the judging result, taking a candidate index set as input, taking the judging result as output, calculating an abnormal correlation value of each candidate index by adopting a Pearson correlation method, taking the candidate index set as input, taking a fluctuation index as output, training by adopting a two-class model, and calculating a performance fluctuation correlation value of each candidate index; Step S202-4, carrying out normalization calculation on the abnormal correlation value and the performance fluctuation correlation value, presetting weights of the abnormal correlation and the performance fluctuation correlation, carrying out weighted summation on the normalized abnormal correlation value and the performance fluctuation correlation value, calculating to obtain comprehensive scores of all original indexes, presetting a comprehensive score threshold, taking the original indexes exceeding the comprehensive score threshold as characteristic indexes, summarizing the characteristic indexes of all service nodes, and establishing a characteristic index library.
  5. 5. A distributed access-based business process data analysis method according to claim 3, wherein the step S203 of establishing a feature association library of each business node comprises the steps of: Step S203-1, obtaining a feature index set corresponding to each service node, extracting the numerical distribution characteristics of each feature index in different analysis periods, calculating to obtain the performance scores of the feature indexes, carrying out weighted summation on the performance scores of the feature indexes based on the performance fluctuation correlation values of the feature indexes, calculating to obtain the service scores of the service nodes, summarizing the service scores and the numerical distribution characteristics of each feature index, and constructing the feature vector of each service node; Step 203-2, obtaining feature vectors of any two service nodes, using a vector similarity method, taking the feature vectors of the two service nodes as input, calculating the similarity degree of the service nodes in the whole feature space, presetting a similarity degree threshold, setting the two service nodes exceeding the similarity degree threshold as associated service nodes, summarizing the associated service nodes corresponding to each service node, and establishing a feature association library of each service node.
  6. 6. The method for analyzing business process data based on distributed access according to claim 1, wherein said step S300 comprises the steps of: Step S301, acquiring an abnormal record of each business process in a training period, setting a time point corresponding to the abnormal period in the abnormal record as a key time point, setting a time period between a starting time point of the abnormal record and the key time point as a characteristic time period, acquiring abnormal business nodes corresponding to each abnormal record, classifying the abnormal records according to different abnormal business nodes, and extracting an analysis period contained in the characteristic time period; Step S302, based on a characteristic index library, collecting characteristic indexes corresponding to abnormal service nodes in each analysis period, calculating to obtain service scores corresponding to each analysis period, based on a characteristic association library, obtaining associated service nodes corresponding to the abnormal service nodes, extracting service scores of the associated service nodes, and calculating the characteristic scores of the abnormal service nodes according to the following formula: ; Wherein A represents the characteristic score of the abnormal service node, B represents the service score of the abnormal service node, ba represents the service score of the a-th associated service node, and Ca represents the similarity degree of the a-th associated service node; Step S303, sorting the feature scores corresponding to the analysis periods in the feature time period according to the time sequence, drawing a graph of the feature scores changing along with time, extracting image features of the graph, wherein the image features comprise slopes, changing amplitudes and curvatures, carrying out normalization calculation on the image features, taking the normalized image features as input, taking the duration of the feature time period as output, training by adopting a regression model, and establishing an abnormal time prediction model.
  7. 7. A business process data analysis system based on distributed access, which is used for realizing the business process data analysis method based on distributed access as claimed in any one of claims 1-6, and is characterized in that the system comprises a characteristic business node module, a library establishment module, an abnormal time prediction model module and a real-time analysis module; The characteristic service node module is used for analyzing the history abnormal record based on the history execution record of the service flow platform, counting the occurrence frequency of abnormal service nodes and determining a characteristic service node set of each service flow; The library establishment module is used for setting a training period, screening out the characteristic indexes of each service node by analyzing the execution records, constructing a characteristic index library, analyzing the similarity degree between the service nodes based on the characteristic indexes of each service node, determining the relevance between each service node and establishing a characteristic relevance library; the abnormal time prediction model module is used for acquiring an abnormal record in a training period, acquiring characteristic indexes corresponding to abnormal service nodes in an analysis period based on a characteristic index library, calculating to obtain service scores of the abnormal service nodes, acquiring service scores of associated service nodes corresponding to the abnormal service nodes based on a characteristic association library, carrying out weighted calculation on the service scores of the abnormal service nodes and the service scores of the associated service nodes corresponding to the abnormal service nodes, calculating to obtain characteristic scores, summarizing the characteristic scores in each analysis period, drawing a graph, extracting image characteristics of the graph, and building an abnormal time prediction model by combining time length; The real-time analysis module is used for collecting the real-time business flow, determining the characteristic business nodes corresponding to the real-time business flow, calculating the real-time business score of each characteristic node, calculating the real-time predicted abnormal time based on the abnormal time prediction model, and determining whether to check or not according to the real-time predicted abnormal time.
  8. 8. The distributed access-based business process data analysis system of claim 7, wherein the feature business node module comprises an execution recording unit and a feature business node determining unit: The execution recording unit is used for identifying an accessed business process in a business process platform, presetting a business process library, acquiring business nodes corresponding to each business process, carrying out node coding on each business node, setting a preset time period as an analysis period, setting the starting time of the analysis period as an initial time point, setting the ending time of the analysis period as a final time point, acquiring a business process received by the business process platform, determining a business node set corresponding to the business process, acquiring operation behavior data of the business node set in each process analysis period during the execution of the business process, respectively extracting boundary data of the operation behavior data at the initial time point and the final time point, summarizing the boundary data and the operation behavior data to obtain analysis data, integrating the analysis data of each process analysis period, generating an execution record and uploading the execution record to the business process platform; The characteristic business node determining unit is used for acquiring a history execution record, acquiring a judging result of each analysis period in the history execution record, setting the analysis period with the judging result being abnormal as an abnormal period, marking the history execution record as a history abnormal record if the abnormal period exists in the history execution record, extracting business processes corresponding to each history abnormal record, classifying the history abnormal records according to the business processes, acquiring abnormal business nodes corresponding to the abnormal period in a certain business process in a history abnormal record set of the business processes, counting the occurrence times of each abnormal business node in the history abnormal record set, calculating to obtain the occurrence frequency of each abnormal business node, presetting an occurrence frequency threshold, setting the abnormal business node exceeding the occurrence frequency threshold as a characteristic business node of the business process, and summarizing and generating a characteristic business node set of each business process.
  9. 9. The distributed access-based business process data analysis system of claim 7, wherein the abnormal time prediction model module comprises a characteristic time period unit determination and an abnormal time prediction model unit establishment: Acquiring an abnormal record of each business process in a training period, setting a time point corresponding to the abnormal period in the abnormal record as a key time point, setting a time period between a starting time point of the abnormal record and the key time point as a characteristic time period, acquiring abnormal business nodes corresponding to each abnormal record, classifying the abnormal records according to different abnormal business nodes, and extracting an analysis period contained in the characteristic time period; The abnormal time prediction model building unit is used for acquiring characteristic indexes corresponding to abnormal service nodes in each analysis period based on a characteristic index library, calculating to obtain service scores corresponding to each analysis period, acquiring associated service nodes corresponding to the abnormal service nodes based on a characteristic association library, extracting service scores of the associated service nodes, calculating characteristic scores of the abnormal service nodes, sequencing the characteristic scores corresponding to the analysis periods in a characteristic time period according to time sequence, drawing a graph with the characteristic scores changing along with time, extracting image characteristics of the graph, carrying out normalization calculation on the image characteristics including slope, change amplitude and curvature, taking normalized image characteristics as input, taking duration of the characteristic time period as output, training by adopting a regression model, and building an abnormal time prediction model.

Description

Distributed access-based business process data analysis system and method Technical Field The invention relates to the technical field of big data analysis, in particular to a business process data analysis system and method based on distributed access. Background In the existing business process data analysis system, the data generated by different nodes have obvious differences in format structure, time precision and semantic expression in a distributed business environment, so that unified analysis and effective alignment of the data are difficult to realize in a subsequent analysis stage, and the accuracy and instantaneity of overall analysis are affected; In addition, most of the existing flow analysis and optimization methods are based on static models or fixed rules for evaluation, and lack of dynamic modeling and cross-node collaborative analysis capability of dependency relationships among flows, the traditional methods generally calculate operation data of all nodes independently, neglect time sequence association and task dependence among nodes, and cannot accurately identify key paths and potential bottlenecks affecting overall service performance, when the flows are abnormal in operation or performance is reduced, a system can only give out post results, and lack of traceability and targeted optimization suggestions of abnormal sources, so that intelligent decision and realization of active optimization are limited; Therefore, how to construct a business process data analysis system supporting distributed access and real-time data fusion, to realize the structural conversion and time synchronization of heterogeneous data from multiple business nodes, and on the basis, to establish a multi-node collaborative analysis model based on a flow dependency graph, to realize intelligent evaluation, bottleneck identification and anomaly detection of flow performance, becomes a technical problem to be solved in the current business process management and data intelligent analysis field. Disclosure of Invention The invention aims to provide a business process data analysis system and method based on distributed access, which are used for solving the problems in the prior art. In order to solve the technical problems, the invention provides a business process data analysis method based on distributed access, which comprises the following steps: step 100, analyzing the history abnormal record based on the history execution record of the business process platform, counting the occurrence frequency of abnormal business nodes, and determining a characteristic business node set of each business process; Step 200, setting a training period, screening out characteristic indexes of each service node by analyzing an execution record, constructing a characteristic index library, analyzing the similarity degree between service nodes based on the characteristic indexes of each service node, determining the relevance between each service node and establishing a characteristic relevance library; Step 300, acquiring an abnormal record in a training period, acquiring characteristic indexes corresponding to abnormal service nodes in an analysis period based on a characteristic index library, calculating to obtain service scores of the abnormal service nodes, acquiring service scores of associated service nodes corresponding to the abnormal service nodes based on a characteristic association library, calculating to obtain characteristic scores, summarizing the characteristic scores in each analysis period, drawing a graph, extracting image characteristics of the graph, and building an abnormal time prediction model by combining time lengths; Step 400, collecting a real-time business process, determining characteristic business nodes corresponding to the real-time business process, calculating real-time business scores of each characteristic node, calculating real-time prediction abnormal time based on an abnormal time prediction model, and determining whether to check according to the real-time prediction abnormal time. Further, step S100 includes: Step S101, in a business process platform, identifying an accessed business process, presetting a business process library, acquiring business nodes corresponding to each business process, carrying out node coding on each business node, setting a preset time period as an analysis period, setting the starting time of the analysis period as an initial time point, setting the ending time of the analysis period as a final time point, acquiring a business process received by the business process platform, determining a business node set corresponding to the business process, acquiring operation behavior data of the business node set in each process analysis period during execution of the business process, extracting boundary data of the operation behavior data at the initial time point and the final time point respectively, summarizing the boundary data and the operation behavior data