Search

CN-114925290-B - Self-service user group expansion method, device, computer equipment and storage medium

CN114925290BCN 114925290 BCN114925290 BCN 114925290BCN-114925290-B

Abstract

The application relates to a self-service user group expansion method, a self-service user group expansion device, computer equipment, a storage medium and a computer program product. The method comprises the steps of obtaining client labels of seed crowds, sorting the client labels according to importance parameters to obtain sorted features, screening the sorted features according to input business information, training a classification model corresponding to the business information based on the screened sorted features to obtain an expansion model of the seed crowds, and expanding the user groups in the whole crowds through the expansion model to obtain target crowds matched with the seed crowds. By adopting the method, the characteristic variables of the seed population can be ordered according to the importance, then business personnel select the characteristic variables and select different algorithm models according to the importance of the seed population and the characteristics of the seed population, so that efficient modeling is realized, and further, people highly similar to the seed population are automatically found out from the population to be expanded for analysis and expansion of the business personnel.

Inventors

  • ZENG MEIHUA
  • HUANG SHUMAN
  • ZHENG YIJIE
  • LIU YILING
  • LV ZILING

Assignees

  • 厦门航空有限公司

Dates

Publication Date
20260512
Application Date
20220606

Claims (10)

  1. 1. A self-service user group extension method, the method comprising: acquiring seed populations according to at least one element of a customer population, a marketing campaign and user characteristics; Dividing participation information of the marketing campaign according to the seed population to obtain an initial positive sample population and an initial negative sample population; Determining a difference value based on the proportion of the initial positive sample population in the whole population, and performing balance treatment on the initial positive sample population and the initial negative sample population to obtain a positive sample population and a negative sample population for model training; acquiring a client tag based on the positive sample group and the negative sample group; sorting the client labels according to importance parameters to obtain sorted characteristics; Displaying the sequenced features, and acquiring service information based on the displayed sequenced features; Screening the sequenced features according to the service information; Training a classification model corresponding to the service information based on the selected ordered characteristics to obtain an expansion model of the seed population, wherein a classification threshold of the classification model is determined based on the quantity proportion of the positive sample population to the negative sample population; And expanding the user group in the whole population by the expansion model to obtain a target population matched with the seed population.
  2. 2. The method of claim 1, wherein the seed population is a partial population of a total population, wherein the determining a difference value based on a ratio of the initial positive sample population to the total population, and the balancing the initial positive sample population and the initial negative sample population to obtain a positive sample population and a negative sample population for model training comprises: If the difference value of the sample number between the initial positive sample group and the total population is greater than or equal to n percent and less than or equal to m percent, taking the initial positive sample group and the initial negative sample group as a positive sample group and a negative sample group with difference balance; If the difference value is smaller than N% or larger than m%, downsampling one of the initial positive sample population and the negative sample population with the largest number of sample populations based on the difference value until the difference between the number ratios of the initial positive sample population and the initial negative sample population is reduced to be within N times, and obtaining a positive sample population and a negative sample population with balanced difference; when the initial negative sample group is missing, randomly sampling from the whole population, and filling the missing initial negative sample group with the sampled sample group to obtain the positive sample group and the negative sample group with balanced difference.
  3. 3. The method of claim 1, wherein the client tags include client tags in a wide table file and client tags in an image file, and wherein the obtaining client tags for a seed group comprises: Screening client labels in the mirror image file based on time points and elements corresponding to crowd selection instructions; selecting the client labels in the wide table file according to seed groups corresponding to the client labels screened in the mirror image file; and the client labels selected from the wide table file are used for generating ordered features in the display.
  4. 4. The method of claim 1, wherein the importance parameters include information values and evidence weights, and wherein the ranking the client tags according to the importance parameters to obtain ranked features comprises: The method comprises the steps of obtaining data obtained by client label sub-boxes, carrying out code conversion according to evidence weight to obtain converted characteristic variables, wherein when an evidence weight value is calculated, if the positive sample number or the negative sample number in the sub-boxes is zero, the zero sample number is adjusted to be a preset non-zero value, and the variance of the evidence weight value is positively related to coefficients fitted by models containing independent variables, and the coefficients are positively related to the contribution rate of the independent variables; calculating the information value of the characteristic variable based on the evidence weight; and sequencing the feature variables according to the information values to obtain sequenced features.
  5. 5. The method of claim 1, wherein training the classification model corresponding to the business information based on the filtered ranked features to obtain the extended model of the seed population comprises: according to the service information, at least one model selection parameter of modeling speed, prediction precision and model applicable characteristics is obtained; selecting at least one model from a logistic regression model, a gradient lifting decision tree model and a multi-layer perceptron model according to the model selection parameters; And based on the sorted characteristics, carrying out classification training on the selected models to obtain the expansion models of the seed population.
  6. 6. The method of claim 1, wherein the expanding the user population in the full population by the expansion model to obtain the target population matching the seed population comprises: calculating the similarity between the whole population and the seed population through the expansion model; sorting the total population based on the similarity to obtain the sorted total population; When a screened or file-uploaded group to be expanded is received, the group to be expanded is associated and matched with the whole population to obtain the similarity score of the group to be expanded; and selecting from the groups to be expanded according to the similarity score to obtain a target group matched with the seed group.
  7. 7. The method of claim 1, wherein the classification threshold is expressed as: classification threshold = number of positive sample populations/(number of negative sample populations + number of positive sample populations).
  8. 8. A self-service subscriber group extension apparatus, the apparatus comprising: the label acquisition module is used for acquiring seed population according to at least one element of the customer population, marketing activities and user characteristics; dividing participation information of the marketing campaign according to the seed population to obtain an initial positive sample population and an initial negative sample population, determining a difference value based on the proportion of the initial positive sample population in the whole population, and carrying out balance treatment on the initial positive sample population and the initial negative sample population to obtain a positive sample population and a negative sample population for model training; The feature generation module is used for sequencing the client labels according to the importance parameters to obtain sequenced features; the self-service screening module is used for displaying the sequenced features and acquiring service information based on the displayed sequenced features; The model training module is used for training a classification model corresponding to the service information based on the selected ordered characteristics to obtain an expansion model of the seed population, wherein the classification threshold value of the classification model is determined based on the quantity proportion of the positive sample population to the negative sample population; And the crowd expansion module is used for expanding the user group in the whole crowd through the expansion model to obtain a target crowd matched with the seed crowd.
  9. 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
  10. 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Self-service user group expansion method, device, computer equipment and storage medium Technical Field The present application relates to the field of data model technologies, and in particular, to a self-service user group extension method, a self-service user group extension device, a computer device, a storage medium, and a computer program product. Background The user group expansion model is based on the customer population responded by the navigation marketing campaign, and potential customers similar to the responded customers in the seed population are quickly found out by analyzing the characteristics of the historical response population and non-response population (the specific response and non-response population are defined as the seed population) of certain specific advertisements/marketing, so that the navigation marketing campaign is effectively helped to mine the potential customers, and the business is expanded. In the traditional user group expansion algorithm model construction process, an algorithm engineer carries out artificial feature engineering and customized machine learning algorithm realization aiming at different marketing scenes. Corresponding to any similar crowd expansion scene, the model results need to be repeatedly communicated by algorithm engineers and business personnel from seed crowd characteristic analysis feature extraction, data preprocessing, machine learning algorithm tuning and optimizing, the modeling period is long, the business scene is various, a plurality of models need to be customized, and the modeling efficiency for user crowd expansion is low. Disclosure of Invention In view of the foregoing, it is desirable to provide a self-service user group extension method, apparatus, computer device, computer readable storage medium, and computer program product that enable efficient user group extension. In a first aspect, the present application provides a self-service user group extension method. The method comprises the following steps: Obtaining client labels of seed groups; sorting the client labels according to importance parameters to obtain sorted characteristics; Screening the sequenced features according to the input service information; Training a classification model corresponding to the service information based on the selected ordered features to obtain an expansion model of the seed population; And expanding the user group in the whole population by the expansion model to obtain a target population matched with the seed population. In one embodiment, the obtaining the client tag of the seed crowd includes: acquiring seed populations according to at least one element of a customer population, a marketing campaign and user characteristics; Dividing the seed population according to participation information of the marketing campaign to obtain an initial positive sample population and an initial negative sample population; judging whether the sample number between the initial positive sample group and the initial negative sample group is balanced or not; if balanced, taking the initial positive sample population and the initial negative sample population as a positive sample population and a negative sample population which are balanced in a difference way; if the sample number is unbalanced, at least one population is selected from the initial positive sample population and the initial negative sample population for sampling based on the sample number, so that a positive sample population and a negative sample population with different balances are obtained; and acquiring a client label based on the positive sample group and the negative sample group. In one embodiment, the seed population is a partial population of the whole population, and the selecting at least one population from the initial positive sample population and the initial negative sample population for sampling based on the sample number to obtain a positive sample population and a negative sample population with balanced difference comprises: when the difference value of the sample number between the initial positive sample group and the total population is in a downsampling interval, downsampling the initial positive sample group based on the difference value to obtain a positive sample group and a negative sample group with balanced difference; when the initial negative sample group is missing, randomly sampling from the whole population, and filling the missing initial negative sample group with the sampled sample group to obtain the positive sample group and the negative sample group with balanced difference. In one embodiment, the client tags include client tags in a wide table file and client tags in an image file, and the client tag of the seed crowd is obtained, including: Screening client labels in the mirror image file based on time points and elements corresponding to crowd selection instructions; selecting the client labels in the wide table file according to seed groups