Search

CN-121980159-A - Data analysis system based on machine learning

CN121980159ACN 121980159 ACN121980159 ACN 121980159ACN-121980159-A

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a data analysis system based on machine learning, which comprises a data structure layering module, a transaction behavior pattern recognition module, a behavior prediction recognition module, an offset error correction module and an information disassembly and storage module. According to the invention, through data structure layering processing based on financial industry characteristics, time partitions of transaction time nodes are identified, a layered data model with time sequence characteristics is established, embedding behavior frequency is compared with density, potential behavior attribution relation among data is revealed, transaction amount, times and account changes are extracted, high-sensitivity behavior characteristics are identified, abnormal behavior identification accuracy is improved, data flow trend is reconstructed based on characteristic weight, a transaction behavior path is dynamically modeled, dynamic analysis deviation characteristics are judged through ratio judgment, adaptability of a prediction result is improved, and data storage parallel processing efficiency and tracking capability are enhanced by combining distributed index path management.

Inventors

  • XIAO JING
  • TIAN TIAN
  • REN WENLONG
  • ZHANG YIYANG
  • GU TIANXIANG
  • Cao Xuning

Assignees

  • 南通大学

Dates

Publication Date
20260505
Application Date
20251205

Claims (8)

  1. 1. A machine learning based data analysis system, the system comprising: The data structure layering module identifies time partitions of the time node sequence based on financial industry data, divides data hierarchy, extracts distribution density of behavior frequency values and the time node sequence, compares the distribution density with the distribution density of the time node sequence, judges attribution relation of each interval, and obtains layered data distribution quantity; the transaction behavior pattern recognition module invokes the hierarchical data distribution quantity, extracts transaction amount data, transaction times data and account change data under each category, performs amplitude sequencing on the transaction amount data and the account change data, and screens sequencing front features to obtain feature category weight values; The behavior prediction recognition module performs time interval arrangement on the data flow direction in the nodes in the future time interval according to the characteristic classification weight value, extracts the numerical value increasing trend of the adjacent time interval and performs offset angle judgment among the increasing trends, and corresponds the offset angle to the time interval length to obtain a reconstruction characteristic prediction trend; And the offset error correction module extracts a difference interval between a predicted value and a real-time value in a time node based on the reconstruction feature prediction trend, analyzes the direction consistency proportion and the difference amplitude proportion of the difference interval, and judges the ratio of the two proportions to obtain offset analysis dynamic prediction data.
  2. 2. The machine learning based data analysis system of claim 1, wherein the hierarchical data distribution includes interval behavior concentration level, time node distribution weight, hierarchical label mapping relation, the feature classification weight value includes transaction amount fluctuation weight, account change response weight, transaction frequency duty cycle weight, the reconstructed feature prediction trend includes a numerical value growth direction sequence, a trend offset angle interval, a time period change gradient, and the offset analysis dynamic prediction data includes a prediction error adjustment parameter, a trend consistency evaluation value, a dynamic correction ratio.
  3. 3. The machine learning based data analysis system of claim 1, wherein the data structure layering module comprises: The time sequence dividing sub-module is used for dividing abrupt change positions and time partitions of a time interval based on financial industry data, including financial payment records, account behavior logs and transaction time node sequences, so as to obtain time section dividing data; the behavior frequency extraction sub-module screens account behavior logs in each partition according to the time section division data, extracts tag frequencies, identifies average occurrence frequencies and acquires behavior tag frequency values; The distribution dense comparison sub-module calls the behavior label frequency value, analyzes the partition transaction node density and the behavior frequency, and adopts the formula: ; calculating a difference value of matching of the behavior and the density, judging the attribution judgment value, identifying the time attribution division amount, and judging the interval attribution relation to obtain layered data distribution amount; Wherein, the The representative behavior matches the intensity to the variance value, Represents the first The behavior label frequency values of the individual partitions, Represents the first The average transaction node concentration for each partition, Represents the first The number density of transaction nodes per unit time of each partition, As the total number of time partitions, Represents the first The behavior label frequency values of the individual partitions, For the maximum concentration value in the neighboring partitions before and after the current partition, Representing the total number of partitions.
  4. 4. The machine learning based data analysis system of claim 3, wherein the transaction behavior pattern recognition module comprises: the transaction data extraction submodule calls the layered data distribution quantity, extracts transaction amount, transaction times and account change data of each category, performs field standardization and data verification, and converts the data into numerical continuous variables to obtain transaction behavior feature quantities; The sorting characteristic screening sub-module compares the transaction amount with the amplitude value of the account change data according to the transaction behavior characteristic quantity, sorts the transaction amount according to the amplitude value in each category, and reserves the characteristic index before sorting to obtain a high-frequency variation characteristic value; The classification weight calculation submodule calls the transaction number data of the high-frequency fluctuation characteristic value, extracts the accumulated value of transaction amount and account change item before sequencing, identifies the proportion of the accumulated value to the transaction number in each classification, and adopts the formula: ; summarizing the characteristic proportion items to obtain characteristic classification weight values; Wherein, the The feature classification weight value is represented as, To order the transaction amount accumulated value of the leading digit, The value is accumulated for the account change, In order to classify the number of transactions under the category, As a risk item Is used for the risk early warning level item of the (a), Is the number of risk items.
  5. 5. The machine learning based data analysis system of claim 4, wherein the behavior prediction recognition module comprises: the classification weight recognition sub-module extracts the data flow direction in the nodes according to the characteristic classification weight value, recognizes the flow increase and decrease interval value and the node class characteristic, analyzes the matching degree of the nodes and the growth trend, and obtains the flow trend classification weight value; The trend deviation judging submodule calls the flow trend classification weight value, extracts the increasing trend vector of the adjacent time period, identifies the ratio of the vector included angle to the increasing rate, and adopts the formula: ; Combining the duration of the time period to obtain the matching degree of the trend offset angle; Wherein, the Representing the degree of matching of the trend offset angles, Representative time period Is used for the flow rate increase value of (1), Representing a time period Is used for the flow rate increase value of (1), Representing the increasing trend vector angle of time period x, Is the number of consecutive time periods in the node; And the prediction trend analysis sub-module extracts the flow trend and the direction angle of the high-matching-degree interval according to the trend deviation angle matching degree, and sequentially recombining and expanding the flow trend and the direction angle to obtain a reconstruction feature prediction trend.
  6. 6. The machine learning based data analysis system of claim 5, wherein the offset error correction module comprises: The error interval extraction submodule identifies the difference between real-time data and predicted data of a time period corresponding to a node based on the reconstruction feature prediction trend, calibrates the node to be different in time at the differentiation time, divides an error interval, determines an upper limit value and a lower limit value, records a start-stop time stamp of the error interval, extracts an error direction and obtains an error direction interval sequence; And the consistency proportion calculation sub-module calculates the duration and the direction switching density of the direction interval according to the error direction interval sequence, calculates the duty ratio of the absolute value of the error in the error direction duration interval, judges the fluctuation of the error trend, and adopts the formula: ; obtaining an error trend offset index; Wherein, the Representing the index of the error trend shift, Representing the proportion of the consistency of the error direction, Representing the error amplitude ratio; And the deviation data prediction submodule calls the error trend deviation index, searches an error deviation index change region, fuses the corrected section and the original trend data, and updates a flow trend curve to obtain deviation analysis dynamic prediction data.
  7. 7. The machine learning based data analysis system of claim 1, wherein the system further comprises an information resolution storage module: The information disassembly and storage module extracts an interval account identity information field, a behavior track field and a time stamp field based on the offset analysis dynamic prediction data, performs field separation according to field types, disperses the account identity information field after being subjected to intensity encryption to a plurality of nodes, records an encryption index path of encrypted data corresponding to the nodes, and obtains a partition storage path of a data analysis field; the partition storage path of the data analysis field comprises an encryption account index path, a behavior field storage position and a time stamp data partition map.
  8. 8. The machine learning based data analysis system of claim 7, wherein the information disaggregation storage module comprises: The field separation submodule identifies account records and analyzes field contents based on the offset analysis dynamic prediction data, performs classification processing according to field identification and marking characteristics, and generates a field classification splitting result; The distributed encryption sub-module calls account identity information fields in the field classification splitting result, extracts a sequence structure and a field bitmap, performs intensity encryption according to a mapping relation, distributes encrypted data to a plurality of nodes, records a key path and a node number of each section of encrypted field, and acquires encrypted node distribution data; And the path indexing submodule calls a key path and a node index in the encrypted node distribution data, classifies and maps the node paths, sorts the index structures of the behavior track field and the timestamp field, and obtains the partition storage path of the data analysis field.

Description

Data analysis system based on machine learning Technical Field The invention relates to the technical field of artificial intelligence, in particular to a data analysis system based on machine learning. Background The technical field of artificial intelligence comprises a system and a method for simulating human intelligence by using a computer, the core content of the field comprises technologies such as machine learning, natural language processing, computer vision, intelligent control and the like, the research of the artificial intelligence aims at enabling a computer system to have human-like learning, reasoning, understanding and decision-making capabilities, the machine learning is used as an important branch of the artificial intelligence, the computer automatically learns and optimizes performance from data through an algorithm, the system is widely applied to the fields such as image recognition, voice recognition and recommendation systems, the data analysis is an integral part of the artificial intelligence, and the data analysis is used for helping to extract valuable knowledge from complex information through processing, analyzing and mining large-scale data. The system is characterized in that the system is used for carrying out large data processing and information extraction through an artificial intelligence technology, particularly a machine learning method, the core of the patent theme is to carry out analysis and prediction on a large amount of data based on machine learning, the data of different sources, formats and structures are processed, useful information is automatically extracted from the data through constructing a proper machine learning model, intelligent decision is carried out according to analysis results, and the workflow of the system comprises data preprocessing, feature extraction, model training, result analysis and the like, and the effective utilization of the data is mainly realized through training of a data set and model optimization. In the prior art, when large-scale financial data is processed, the characteristics of strong time sequence, behavior frequency, account fluctuation amplitude and the like in the financial industry data cannot be fully combined by means of multi-reliance universal data preprocessing and model training, so that the pertinence is lacking in a data modeling stage. In terms of data layering, the problems of rough interval division and unclear feature expression are solved, the capturing capacity of a model on fine-grained data features is limited, classification weight judgment is carried out by static indexes in the behavior recognition process, the correlation modeling of dynamic amplitude and variation trend is lacked, nonlinear change in the behavior evolution process is easy to miss, and the recognition accuracy is reduced. In the aspect of error correction, a prediction result is often adjusted by a fixed deviation model, so that interval differential response to the direction and degree of prediction deviation is difficult, and the problem of insufficient adaptation capability exists. In the aspect of data storage management, a centralized field processing mode is adopted, a field safety isolation and node dispersion mechanism is lacked, once a node fails or data is tampered, information tracing difficulty and data integrity risk are caused, and the problems not only limit the depth and the precision of data analysis, but also influence the real-time response capability and the operation safety of a system. Disclosure of Invention The object of the present invention is to solve the drawbacks of the prior art and to propose a machine learning based data analysis system. In order to achieve the above object, the present invention adopts the following technical scheme that a data analysis system based on machine learning includes: The data structure layering module identifies time partitions of the time node sequence based on financial industry data, divides data hierarchy, extracts distribution density of behavior frequency values and the time node sequence, compares the distribution density with the distribution density of the time node sequence, judges attribution relation of each interval, and obtains layered data distribution quantity; the transaction behavior pattern recognition module invokes the hierarchical data distribution quantity, extracts transaction amount data, transaction times data and account change data under each category, performs amplitude sequencing on the transaction amount data and the account change data, and screens sequencing front features to obtain feature category weight values; The behavior prediction recognition module performs time interval arrangement on the data flow direction in the nodes in the future time interval according to the characteristic classification weight value, extracts the numerical value increasing trend of the adjacent time interval and performs offset angle judgment among the