CN-113743493-B - Group classification method and electronic equipment
Abstract
The invention discloses a group classification method and electronic equipment, wherein the group classification method comprises the steps of clustering a plurality of first certificates based on the value of each set factor in at least two set factors corresponding to the first certificates respectively to obtain a first clustering result corresponding to each set factor, carrying out weighted summation on first risk values corresponding to each first certificate in each first clustering result to obtain a second risk value corresponding to each first certificate, representing random risk values corresponding to cluster centers of clusters where the first certificates are located in the corresponding first clustering result by the first risk values, clustering the plurality of first certificates based on the second risk values corresponding to the first certificates to obtain a second clustering result, and outputting a group classification result based on the second clustering result and users corresponding to the first certificates.
Inventors
- FU XIULIANG
- HU CHUANGDA
- QIAN JIN
Assignees
- 深圳前海微众银行股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20210830
Claims (9)
- 1. A method of classifying a population, comprising: Clustering a plurality of first certificates based on the value of each set factor in at least two set factors corresponding to the first certificates to obtain a first clustering result corresponding to each set factor, wherein the first certificates represent certificates of set stage service, and the at least two set factors comprise at least two of overdue days, a first proportion representing the quotient of overdue numbers and total overdue numbers and a second proportion representing the quotient of overdue resource shares and total resource shares; The method comprises the steps of carrying out weighted summation on first risk values corresponding to each first credential in each first clustering result to obtain second risk values corresponding to each first credential, wherein the first risk values represent random risk values corresponding to cluster centers of clusters where the first credentials are located in the corresponding first clustering result; Clustering the plurality of first certificates based on a second risk value corresponding to the first certificates to obtain a second clustering result, wherein the clustering comprises the steps of calculating a sequencing sequence number corresponding to each cluster core based on the number of clustered clusters and the total number of the first certificates, determining all cluster cores in the first certificates sequenced according to first values based on the sequencing sequence number corresponding to each cluster core, wherein the first values comprise the second risk value; And outputting a group classification result based on the second classification result and the user corresponding to the first credentials.
- 2. The method according to claim 1, wherein, when clustering the plurality of first credentials based on the value of each of at least two setting factors corresponding to the first credentials, respectively, the method comprises: Based on the cluster number and the total number of the first certificates, calculating the sequencing serial number corresponding to each cluster core; Determining all cluster centers in a first certificate which is sequenced according to a first numerical value based on the sequencing serial number corresponding to each cluster center, wherein the first numerical value comprises a value of a set factor; Calculating a first difference value between a first value corresponding to the first certificate and a first value corresponding to each cluster core; and adding the first certificate to the cluster where the cluster center corresponding to the smallest first difference value is located, so as to obtain a first clustering result.
- 3. The method according to claim 1 or 2, wherein in said calculating a first difference between the first value corresponding to the first credential and the first value corresponding to each cluster core, the method comprises: the square of the difference between the square of the first value corresponding to the first credential and the square of the first value corresponding to the cluster center is determined as a first difference.
- 4. The method of claim 1 or 2, wherein after adding all the first credentials to the corresponding cluster, the method further comprises: Calculating a convergence threshold and an absolute difference value corresponding to each cluster, wherein the absolute difference value represents an absolute value of a difference between a first numerical value corresponding to a cluster center of the corresponding cluster and a corresponding first average value, the first average value represents an average value of the first numerical values corresponding to all the first credentials in the corresponding cluster, the convergence threshold represents a quotient of a second difference value and the total number of random risk values, and the second difference value represents a difference between a maximum first numerical value and a minimum first numerical value; Determining new cluster centers of the corresponding clusters based on the first average value corresponding to each cluster under the condition that the calculated absolute difference value is larger than the convergence threshold value, and executing the first difference value between the first numerical value corresponding to the calculated first evidence and the first numerical value corresponding to each cluster center and the subsequent steps, or And if all the calculated absolute differences are smaller than or equal to the convergence threshold, not re-clustering the first credentials.
- 5. The method according to claim 4, wherein the method further comprises: Determining the minimum value of the second value and the total number of the random risk values as a new cluster number under the condition that the cluster number is smaller than the total number of random risk values corresponding to all the first credentials and the second average value is larger than or equal to the convergence threshold value, and re-clustering the plurality of first credentials based on the new cluster number and the first value corresponding to each first credential, The second mean represents the absolute value of the mean value of the difference value of every two adjacent first mean values in the first mean value array, and the second numerical value is determined by the cluster number and the difference between the second mean value and the convergence threshold value.
- 6. The method of claim 5, wherein the method further comprises: Ending the clustering when the cluster number is greater than or equal to the total number of the random risk values or the second average value is smaller than the convergence threshold value, and respectively sorting the first numerical values and the random risk values corresponding to all cluster centers according to a first sorting mode; assigning each random risk value in the ordered random risk values to a cluster center corresponding to a first numerical value in the corresponding ordering sequence number; and determining the random risk value assigned to each cluster core as the risk value corresponding to each first credential in the corresponding cluster.
- 7. The method of claim 1, wherein prior to clustering the first credentials based on the value of each of the at least two settings factors corresponding to the first credentials, respectively, the method further comprises: generating a random risk value corresponding to each first certificate through a set random number function; and respectively determining the value of the setting factor corresponding to the first certificate with the same random risk value through a plurality of threads.
- 8. An electronic device, comprising: The first clustering unit is used for clustering a plurality of first certificates based on the value of each set factor in at least two set factors corresponding to the first certificates to obtain a first clustering result corresponding to each set factor, wherein the first certificates represent certificates of set stage service, and the at least two set factors comprise at least two of overdue days, a first proportion representing the quotient of overdue resource share and total resource share and a second proportion representing the quotient of overdue resource share; the computing unit is used for carrying out weighted summation on the first risk values corresponding to each first credential in each first clustering result to obtain a second risk value corresponding to each first credential; the first risk value represents a random risk value corresponding to a cluster center of a cluster where the first credential is located in a corresponding first clustering result; The second clustering unit is used for clustering the plurality of first certificates based on second risk values corresponding to the first certificates to obtain a second clustering result, and comprises the steps of calculating a sequencing sequence number corresponding to each cluster core based on the number of clusters and the total number of the first certificates, determining all cluster cores in the first certificates sequenced according to first values based on the sequencing sequence number corresponding to each cluster core, wherein the first values comprise the second risk values; and the classification unit is used for outputting a group classification result based on the second classification result and the user corresponding to the first credentials.
- 9. An electronic device comprising a processor and a memory for storing a computer program capable of running on the processor, Wherein the processor is adapted to perform the steps of the method of any of claims 1 to 7 when the computer program is run.
Description
Group classification method and electronic equipment Technical Field The present invention relates to the field of computer technologies, and in particular, to a group classification method and an electronic device. Background With the development of computer technology, more and more technologies are applied in the financial field, and the traditional financial industry is gradually changed to the financial technology, however, the financial technology also puts higher demands on the technologies due to the requirements of safety and real-time performance of the financial industry. In the field of financial science and technology, in a scene of classifying the user groups, the set risk category of the credential is determined based on the corresponding relation between the set risk category and the set threshold range of the key index value of the credential of the user, and the group classification result of the user is output based on the set risk category of the credential and the corresponding relation between the credential and the user. However, the set threshold range corresponding to the set risk category is set according to an empirical value, thereby causing the group classification result to be inconsistent with the actual situation and inaccurate. Disclosure of Invention In view of the above, the embodiment of the invention provides a population classification method and electronic equipment, so as to solve the technical problem of inaccurate population classification results in the related art. In order to achieve the above purpose, the technical scheme of the invention is realized as follows: the invention provides a group classification method, which comprises the following steps: clustering the plurality of first certificates based on the value of each set factor in at least two set factors corresponding to the first certificates respectively to obtain a first clustering result corresponding to each set factor; The method comprises the steps of carrying out weighted summation on first risk values corresponding to each first credential in each first clustering result to obtain second risk values corresponding to each first credential, wherein the first risk values represent random risk values corresponding to cluster centers of clusters where the first credentials are located in the corresponding first clustering result; clustering the plurality of first certificates based on a second risk value corresponding to the first certificates to obtain a second clustering result; And outputting a group classification result based on the second classification result and the user corresponding to the first credentials. In the above solution, when clustering the plurality of first credentials, the method includes: Based on the cluster number and the total number of the first certificates, calculating the sequencing serial number corresponding to each cluster core; Determining all cluster centers in a first certificate which is sequenced according to a first numerical value based on the sequencing serial number corresponding to each cluster center, wherein the first numerical value comprises a value of a set factor or a second risk value; Calculating a first difference value between a first value corresponding to the first certificate and a first value corresponding to each cluster core; and adding the first certificate to the cluster where the cluster center corresponding to the smallest first difference value is located. In the above solution, when the first difference between the first value corresponding to the first credential and the first value corresponding to each cluster center is calculated, the method includes: the square of the difference between the square of the first value corresponding to the first credential and the square of the first value corresponding to the cluster center is determined as a first difference. In the above solution, after adding all the first credentials to the corresponding cluster, the method further includes: Calculating a convergence threshold and an absolute difference value corresponding to each cluster, wherein the absolute difference value represents an absolute value of a difference between a first numerical value corresponding to a cluster center of the corresponding cluster and a corresponding first average value, the first average value represents an average value of the first numerical values corresponding to all the first credentials in the corresponding cluster, the convergence threshold represents a quotient of a second difference value and the total number of random risk values, and the second difference value represents a difference between a maximum first numerical value and a minimum first numerical value; Determining new cluster centers of the corresponding clusters based on the first average value corresponding to each cluster under the condition that the calculated absolute difference value is larger than the convergence threshold value, and e