Search

CN-121561681-B - User behavior analysis method and system based on normal behavior of user

CN121561681BCN 121561681 BCN121561681 BCN 121561681BCN-121561681-B

Abstract

The invention discloses a user behavior analysis method and system, which are characterized in that a frequent item set and a rare item set of normal behavior are mined from massive unlabeled behavior data based on the normal behavior data of a user, a precise basis is provided for data classification, a plurality of classes and a minority class data set are divided by taking the rare item set as a standard, minority class data are anchored by virtue of a scarcity characteristic, the processing precision is improved, the majority class data set is subjected to N times of conversion based on the rare item set, the converted data are classified into the minority class data set, the two classes of data are dynamically balanced, the sensitivity of a model to a minority class normal behavior mode is improved, a user behavior graph and graph neural network learning model is constructed based on balanced training data, the complex behavior mode of the user is structurally represented, graph structure, node and edge characteristics of the normal behavior mode are obtained by inputting actual user behavior data into the trained model, and a precise user behavior analysis result is output.

Inventors

  • QI JIANHUAI
  • HU JINHUA
  • ZHANG LI
  • SONG JING
  • XU GUOQIAN

Assignees

  • 深圳市永达电子信息股份有限公司

Dates

Publication Date
20260508
Application Date
20260123

Claims (10)

  1. 1. A user behavior analysis method based on normal behavior of a user, the method comprising: s1, collecting all normal behavior data of all users as an original data set; S2, carrying out feature quantity item set mining on the original data set to obtain a frequent item set and a rare item set; S3, dividing the original data set containing the rare item set into a minority class data set, and dividing the original data set not containing the rare item set into a majority class data set; S4, determining the number N of the majority data sets to be converted from the majority data set to the minority data set based on the number difference of the majority data set and the minority data set, converting the majority data sets to be converted based on the minority term set for N times, classifying the data sets after each conversion into the minority data set, and realizing class balance in the process of dynamically adjusting the two types of data sets to obtain a training data set after balance; s5, constructing a user behavior diagram based on the balanced training data set; S6, constructing a graph neural network learning model, and learning the user behavior graph to obtain a user behavior analysis model; S7, analyzing the actual user behavior data according to the analysis model, and outputting a user behavior analysis result.
  2. 2. The method for analyzing user behavior based on normal behavior of a user according to claim 1, wherein step S4 further comprises presetting a target class balance ratio, specifically comprising: Determining a number N of majority class data sets to be converted from the majority class data set to the minority class data set according to the target class balance ratio, and performing N conversion operations each including extracting one majority class data set from the majority class data set; and synchronously reducing the number of the data groups of the majority class data group set and increasing the number of the data groups of the minority class data group set by executing the N times of conversion operation, thereby outputting the balanced training data set.
  3. 3. The method according to claim 2, wherein the step of converting the majority class data set based on the rare term set further comprises performing a reverse reduction on the intermediate minority class data set, and adding the reverse reduced intermediate minority class data set to the minority class data set.
  4. 4. The method for analyzing the user behavior based on the normal behavior of the user according to claim 1, wherein the preprocessing of the original data set is further included after the step S1, and the preprocessing includes performing feature value discretization processing on each data set in the original data set and performing feature association processing on the discretized data set.
  5. 5. The method for analyzing user behavior based on normal user behavior according to claim 4, wherein the feature quantity item set mining of the original data set is specifically feature quantity item set mining of the original data set based on an improved association rule algorithm, and the step S2 specifically includes: S21, taking each feature in the data set after feature association processing as a feature item, and calculating the support degree of each feature item, setting a support degree threshold value, and acquiring frequent 1 item sets and rare 1 item sets according to the support degree and the support degree threshold value; s22, carrying out S item set mining on the frequent 1 item set and the rare 1 item set respectively to obtain the frequent item set and the rare item set.
  6. 6. The method for analyzing the user behavior based on the normal behavior of the user according to claim 5, wherein the step S22 specifically includes: S221, constructing a head table based on the frequent 1 item set and the rare 1 item set, and acquiring event data based on the head table; s222, constructing a characteristic item tree based on the event data; S223, acquiring a condition mode base of each characteristic item based on the item header table and the characteristic item tree; S224, filtering characteristic items of which the number of the condition mode bases is smaller than the support threshold value count, reconstructing a head list based on the filtered condition mode bases, and acquiring new event data based on the reconstructed head list; s225, repeating the steps S222-S224, and recursively mining to obtain an S item set; And S226, respectively calculating the support degree, the confidence degree and the lifting degree of each mined S item set, setting a first support degree threshold value, a second support degree threshold value, a confidence degree threshold value and a lifting degree threshold value, and screening the S item sets to obtain frequent item sets and rare item sets based on the support degree, the confidence degree, the lifting degree, the first support degree threshold value, the second support degree threshold value, the confidence degree threshold value and the lifting degree threshold value.
  7. 7. The method for analyzing user behavior based on normal user behavior according to claim 6, wherein step S226 is followed by step S227 of performing a deduplication operation on the frequent item set and the rare item set, respectively, to obtain a pure frequent item set and a pure rare item set.
  8. 8. The method for analyzing the normal behavior of the user according to claim 6, wherein the step S221 of constructing the head table includes forming the head table in a form of descending order of the supporting degree of the frequent 1 item set and the rare 1 item set; the step S223 specifically includes traversing upwards from the bottom of the header table, extracting, for each traversed feature item, a corresponding prefix path from a feature item tree, where the count of each prefix path is equal to the count of the corresponding node of each feature item, so as to form a condition pattern base of each feature item.
  9. 9. The method for analyzing the user behavior based on the normal behavior of the user according to claim 1, wherein the step S5 specifically includes: Determining node types and edge feature types based on the balanced training data set; and creating a corresponding node for each data set in the training data set according to the node type and the edge characteristic type, adding a corresponding edge characteristic for the node, and constructing and forming a user behavior diagram.
  10. 10. A user behavior analysis system based on normal behavior of a user, the system comprising: The data acquisition module is used for acquiring all normal behavior data of all users as an original data set; The frequent and rare item set mining module is used for mining the feature quantity item set of the original data set to obtain a frequent item set and a rare item set; the data dividing module is used for dividing the original data group containing the rare item set into a minority class data group set and dividing the original data group not containing the rare item set into a majority class data group set; The data balancing module is used for determining the number of the majority data sets to be converted from the majority data set to the minority data set based on the number difference of the majority data set and the minority data set, converting the majority data sets to be converted based on the minority data set, classifying the converted data sets into the minority data set, and realizing class balancing in the process of dynamically adjusting the two types of data sets so as to obtain a balanced training data set; the diagram construction module is used for constructing a user behavior diagram based on the balanced training data set; the graph neural network learning module is used for constructing a graph neural network learning model to learn the user behavior graph and acquire a user behavior analysis model; And the analysis module is used for analyzing the actual user behavior data according to the analysis model and outputting a user behavior analysis result.

Description

User behavior analysis method and system based on normal behavior of user Technical Field The invention relates to the technical field of data analysis, in particular to a user behavior analysis method and system based on normal behaviors of a user. Background With popularization of information systems and network applications, user behavior data is expanded in a burst manner, and key values such as user demands, behavior patterns and potential risks are contained in the user behavior data, so that accurate anomaly detection is carried out on the behavior data, and the method has great significance in guaranteeing system safety and identifying potential risks (such as internal threat, account embezzlement and fraudulent conduct). The traditional user behavior analysis and anomaly detection method mainly comprises a normal behavior based on a user and an abnormal behavior based on the user, wherein the normal behavior based on the user mainly learns a normal behavior mode, and the behavior is identified as the abnormal behavior when a certain behavior is detected to be inconsistent with the normal behavior mode. It should be noted that the normal behavior of the user typically includes a high frequency normal behavior and a low frequency normal behavior, and that there may be low frequency but high value behavior in the low frequency normal behavior, but since these low frequency normal behaviors sometimes have very rare sample amounts, this makes the current prior art either directly ignore the rare low frequency normal behavior and represent all the normal behavior of the user only by the high frequency normal behavior, or the tag dependence is strong, and the training is heavily dependent on a large number of marked samples, whereas the user behavior data is typically not labeled by category, and the low frequency behavior may be rare and pattern variable, which results in a lack of high quality overall labels even if the model has category labels. Both directions may result in inaccurate user behavior analysis results based on the user normal behavior analysis. Therefore, how to provide a method for efficiently and accurately analyzing user behaviors based on user normal behaviors considering both high-frequency normal behaviors and low-frequency normal behaviors without real labels is a technical problem to be solved by those skilled in the art. Disclosure of Invention In order to solve the technical problems, the invention provides a user behavior analysis method based on user normal behaviors, which can simultaneously consider the user normal behaviors of high-frequency normal behaviors and low-frequency normal behaviors so as to perform efficient and accurate analysis on the user behaviors. The technical scheme provided by the invention is as follows: the invention provides a user behavior analysis method based on normal behavior of a user, which comprises the following steps: s1, collecting all normal behavior data of all users as an original data set; S2, carrying out feature quantity item set mining on the original data set to obtain a frequent item set and a rare item set; S3, dividing the original data set containing the rare item set into a minority class data set, and dividing the original data set not containing the rare item set into a majority class data set; S4, determining the number N of data groups to be converted from the majority data group set to the minority data group set based on the number difference of the majority data group set and the minority data group set, converting the data groups for N times based on the minority term set, classifying the data groups after each conversion into the minority data group set, and realizing class balance in the process of dynamically adjusting the two types of data group sets so as to obtain a balanced training data set; s5, constructing a user behavior diagram based on the balanced training data set; S6, constructing a graph neural network learning model, and learning the user behavior graph to obtain a user behavior analysis model; S7, analyzing the actual user behavior data according to the analysis model, and outputting a user behavior analysis result. According to the invention, after normal behaviors of a user are collected, high-frequency and low-frequency behavior data are accurately extracted, and then high-efficiency class balancing is performed on the two types of data, so that the sensitivity of a subsequent learning model to the low-frequency normal behaviors is improved, an adaptive learning model is further constructed to perform deep learning on the two balanced types of data, and finally the actual user behaviors are input into the learned model, so that an accurate user behavior analysis result is obtained. In some embodiments of the present invention, the method further includes, step S4 further includes presetting a target class balance ratio, specifically including: Determining the number N of data groups required to be