Search

CN-122022906-A - Self-adaptive clustering optimization method based on density information entropy

CN122022906ACN 122022906 ACN122022906 ACN 122022906ACN-122022906-A

Abstract

The invention relates to the technical field of customer management and data analysis, in particular to a self-adaptive clustering optimization method based on density information entropy, which comprises the steps of collecting original transaction data of customers and preprocessing; the method comprises the steps of constructing a TFA client subdivision model, calculating three indexes of a client development space, a purchasing frequency and an average purchasing amount, weighting by a hierarchical analysis method after standardization to form a weighted feature vector, constructing a data set, adaptively determining an optimal cut-off distance based on a density information entropy optimization K-means algorithm, selecting an initial cluster center according to a density and distance criterion, iterating to complete clustering, evaluating by adopting a contour coefficient and Calinski-Harabasz index, determining the optimal cluster center number, and outputting a client clustering and value interpretation result. The invention can realize scientific depiction of the static value and the dynamic potential of the clients, improves the clustering stability and the accuracy, has strong service interpretation of the subdivision results, can support the accurate marketing and the resource optimization configuration of enterprises, and adapts to the subdivision demands of the clients in multiple industries.

Inventors

  • PU XIAOCHUAN
  • YAN CHANGGUO
  • ZHANG YUANQIANG
  • RUAN QINGQIANG

Assignees

  • 遵义师范学院

Dates

Publication Date
20260512
Application Date
20260121

Claims (6)

  1. 1. The self-adaptive clustering optimization method based on the density information entropy is characterized by comprising the following steps of: S1, collecting original transaction data of a customer, and preprocessing the original transaction data of the customer to obtain the transaction data of the customer; S2, calculating the client development space of each client based on the client transaction data Frequency of purchase Average purchase amount The three indexes are subjected to standardized processing, weights are given to the three indexes by adopting a hierarchical analysis method to form weighted feature vectors, a client data set is constructed based on the weighted feature vectors of all clients, and each weighted feature vector corresponds to one sample point; S3, optimizing a K-means clustering algorithm based on density information entropy, and carrying out clustering analysis on the client data set to obtain a clustering result, wherein the S3 comprises the following substeps: S31 calculating the local density of each sample point The calculation formula is as follows: Wherein, the Is the total number of sample points in the customer dataset, Is a sample point And (3) with The euclidean distance between the two, Is a cut-off distance, indicating function The method meets the following conditions: Time of day Otherwise ; S32, calculating local density probability The calculation formula is as follows: calculating information entropy The calculation formula is as follows: Obtaining information entropy by gradient descent method Minimum cut-off distance As an optimal cut-off distance; s33, cutting off the distance Taking the value as the optimal cut-off distance, and calculating the local density of all samples; S34, local density is obtained Sorting in a descending order, selecting a sample point with the maximum density as a first cluster center, calculating the minimum distance between all sample points and the selected cluster center, selecting the sample point with the maximum minimum distance value as a next cluster center, and repeating the process until K preset cluster centers are selected; S34, performing standard K-means iteration until the cluster center is not changed or the maximum iteration number is reached, and completing the clustering of all sample points to obtain a clustering result; S4, evaluating the clustering results of different preset K values by adopting an internal evaluation index to determine the optimal cluster center number; s5, grouping the clients according to the clustering result corresponding to the optimal cluster center number, and outputting the client grouping and value interpretation result.
  2. 2. The adaptive clustering optimization method based on density information entropy of claim 1, wherein the preprocessing in S1 includes deleting records containing missing values and filtering outliers by using a quartile method.
  3. 3. The adaptive clustering optimization method based on density information entropy according to claim 2, wherein the client development space in S2 The calculation formula of (2) is as follows: Wherein, the Indicating the last time the customer consumed, Indicating a preset time for the point of view, Mean value of historical consumption time interval of the customer is represented, and purchase frequency is represented Average purchase amount for the total number of times the customer consumes in the preset first time period And the ratio of the total consumption amount to the total consumption number of the customer in the preset second time period is represented.
  4. 4. The adaptive clustering optimization method based on density information entropy as claimed in claim 3, wherein the normalization process in S2 is normalized by using Z-score, and the normalized values The calculation formula is as follows: Wherein, the Representing the original index value of the index, Is the average value corresponding to the original index, Is the standard deviation of the corresponding index, the original index value Including customer development space Frequency of purchase Average purchase amount The normalized customer development space is obtained after the Z-score normalization Frequency of purchase And average purchase amount 。
  5. 5. The adaptive clustering optimization method based on density information entropy of claim 4, wherein the step of assigning weights by using analytic hierarchy process in S2 comprises constructing a judgment matrix, normalizing weight vectors and checking consistency, and finally obtaining weights of three indexes respectively as follows 、 、 Satisfies the following conditions Weighted feature vector Characterized by [ the 。
  6. 6. The method for adaptive clustering optimization based on density information entropy of claim 5, wherein the internal evaluation index in S4 comprises a contour coefficient and Calinski-Harabasz index, wherein the contour coefficient is The calculation formula is as follows: Wherein, the Is a sample point Average distances to other sample points within the same cluster, Is a sample The minimum value of the average distance to all sample points in any other cluster is calculated by the index calculation formula of Calinski-Harabasz: Wherein, the In order to obtain the number of samples, For a preset number of cluster centers, And The traces of the inter-cluster discrete matrix and the intra-cluster discrete matrix, respectively.

Description

Self-adaptive clustering optimization method based on density information entropy Technical Field The invention relates to the technical field of customer relation management and data analysis, in particular to a self-adaptive clustering optimization method based on density information entropy. Background Customer subdivision is a core link of customer relationship management, and the core goal is to divide sub-groups with different values, behaviors or demands according to the multidimensional characteristics of customers, so as to provide accurate support for enterprise decision-making. At present, an RFM model and a derivative model thereof established according to the latest consumption (Recency), the consumption Frequency (Frequency) and the consumption amount (Monetary) are classical frameworks in the field of client subdivision, but the traditional RFM model has obvious limitations that firstly, static description is only carried out based on historical transaction behaviors, the future life cycle value and development potential of clients cannot be quantitatively evaluated, different types of clients such as 'historical high-value but not recently consumed' and 'recent low-value consumption' are difficult to distinguish, secondly, weighting treatment is not carried out on each index generally, namely, differentiated contribution of each index in different industries and scenes is not considered, and the service guidance of subdivision results is insufficient. In addition, in the application aspect of the clustering algorithm, the K-means algorithm is widely used for customer subdivision due to simple principle and high calculation efficiency, but the algorithm has the following defects that the clustering result is seriously dependent on an initial cluster center selected randomly, the result is unstable, the convergence speed is slow and falls into a local optimal solution easily, in the existing improvement scheme, the calculation amount is obviously increased by adopting a method of taking the optimal solution through multiple random initialization, the method of selecting the initial center based on the data distribution density is required to preset sensitive parameters such as cut-off distance and the like, the small variation of the parameters has great influence on the result, and a self-adaptive parameter determination mechanism is lacked. In summary, the existing client subdivision technology lacks effective description of the development potential of clients and scientific weighting of various indexes in model construction, and the current clustering stability is poor and the parameter self-adaption is insufficient in algorithm realization, so that the accuracy, stability and business interpretation of subdivision results cannot meet the fine operation requirements of enterprises, and a systematic solution is needed. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a self-adaptive clustering optimization method based on density information entropy, which is used for constructing a TFA model according to customer development space (Trend) ‌, ‌ purchase Frequency (Frequency ‌) and Average purchase amount (Average) to make up the defects of the traditional RFM model, solving the problem of clustering stability by combining density information entropy optimization K-means algorithm, and realizing accurate and stable customer subdivision. The invention provides a self-adaptive clustering optimization method based on density information entropy, which comprises the following steps: S1, collecting original transaction data of a customer, and preprocessing the original transaction data of the customer to obtain the transaction data of the customer; S2, calculating the client development space of each client based on the client transaction data Frequency of purchaseAverage purchase amountThe three indexes are subjected to standardized processing, weights are given to the three indexes by adopting a hierarchical analysis method to form weighted feature vectors, a client data set is constructed based on the weighted feature vectors of all clients, and each weighted feature vector corresponds to one sample point; S3, optimizing a K-means clustering algorithm based on density information entropy, and carrying out clustering analysis on the client data set to obtain a clustering result, wherein the S3 comprises the following substeps: S31 calculating the local density of each sample point The calculation formula is as follows: Wherein, the Is the total number of sample points in the customer dataset,Is a sample pointAnd (3) withThe euclidean distance between the two,Is a cut-off distance, indicating functionThe method meets the following conditions: Time of day Otherwise; S32, calculating local density probabilityThe calculation formula is as follows: calculating information entropy The calculation formula is as follows: Obtaining information entropy by gra