CN-122020105-A - User characteristic processing method, device and medium

CN122020105ACN 122020105 ACN122020105 ACN 122020105ACN-122020105-A

Abstract

The application relates to the technical field of electric digital data processing, in particular to a user characteristic processing method, equipment and medium. The method comprises the steps of obtaining a target user feature set, respectively determining feature subsets with time attribute values equal to, smaller than and larger than an intermediate value as the intermediate feature subset, a first feature subset and a second feature subset of any user for the feature set of the user, screening the target feature subset of the user from the second feature subset of any user if the state label of the user corresponding to the intermediate feature subset of the user is a first preset label, otherwise screening the target feature subset of the user from the first feature subset of the user, and determining the target feature subset of the user as a positive sample when training a target model if screening is successful. The application can improve the efficiency of training sample extraction.

Inventors

YU FENGFENG
FANG YI
WANG ZHIHAO

Assignees

每日互动股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. A method of processing a user feature, the method comprising the steps of: The method comprises the steps of acquiring a target user feature set, wherein the target user feature set comprises feature sets of a plurality of users, any feature set comprises a plurality of feature subsets, and any feature subset comprises a plurality of feature data with the same time attribute and of a preset type; For the feature set of any user, respectively determining the feature subsets with the time attribute values equal to, smaller than and larger than the intermediate value as an intermediate feature subset, a first feature subset and a second feature subset of the user; Screening a target feature subset of the user from a second feature subset of the user if the state label of the user corresponding to the middle feature subset of any user is a first preset label, otherwise screening the target feature subset of the user from the first feature subset of the user, wherein the target feature subset of the user meets the condition that the state label of the user corresponding to the feature subset with the time attribute value smaller than that of the target feature subset is the first preset label and the state label of the user corresponding to the feature subset with the time attribute value larger than that of the target feature subset is the second preset label; And if the screening is successful, determining the target feature subset of the user as a positive sample when training a target model, wherein the target model is used for acquiring the state label of the user.
2. The method of claim 1, wherein filtering the target feature subset of the user from the second feature subset of the user comprises: The second feature subsets are sequenced from small to large according to time attribute values, and a first feature subset sequence is obtained; Taking a first preset value as a step length, and starting to judge from an nth feature subset with the smallest time attribute value in the first feature subset sequence, wherein n is the first preset value; if the state label of the user corresponding to the nth feature subset with the minimum value is the second preset label, continuing to judge the state label of the user corresponding to the (n-1) th feature subset with the minimum time attribute value in the feature subset sequence; And if the state label of the user corresponding to the smallest n-1 th feature subset is the second preset label, continuing to judge along the direction of the time attribute value becoming smaller until the state label of the user corresponding to a certain feature subset is judged to be the first preset label, and determining the next feature subset of the feature subset as the target feature subset.
3. The method for processing a user feature according to claim 2, wherein the acquiring of the first preset value includes: obtaining a difference value between a maximum value and a minimum value of time attributes of the feature subset included in the second feature subset; Acquiring an average time interval of time attribute values of adjacent feature subsets included in the second feature subset according to the difference value and the number of feature subsets included in the second feature subset; Acquiring the feature subset density in unit time according to the average time interval; and acquiring a first preset value according to the feature subset density, wherein the first preset value is positively correlated with the feature subset density.
4. The method of processing the user feature of claim 2, wherein screening the target feature subset of the user from the second feature subset of the user further comprises: If the state label of the user corresponding to the smallest nth feature subset is the first preset label, continuing to judge the state label of the user corresponding to the 2 nd feature subset with the smallest time attribute value in the feature subset sequence, and if the state label of the user corresponding to the smallest 2 nd feature subset is the first preset label, continuing to judge along the direction of the time attribute value becoming larger.
5. The method for processing user features according to claim 1, wherein the process of acquiring the target user feature set includes: And screening the initial user feature set according to preset screening conditions to obtain a target user feature set, wherein the initial user feature set comprises a plurality of user feature sets, the preset screening conditions comprise that the number of feature subsets included in the user feature set is larger than or equal to the target preset number, and the difference value between the maximum value and the minimum value of the time attribute of the feature subset included in the user feature set is larger than or equal to the target preset time interval.
6. The method for processing user features according to claim 1, wherein the process of acquiring the target user feature set includes: For any feature subset of any user, if the feature data of a certain preset type in the feature subset of the user is empty, the feature data of the preset type in the feature subset of the user is acquired according to the feature data of the preset type in the similar feature subset of the user, wherein the similar feature subset of the user meets the condition that the time attribute value is smaller than the time attribute value of the feature subset, and the difference value between the time attribute value and the time attribute value of the feature subset is smaller than or equal to a preset time threshold.
7. The method of claim 1, further comprising determining any feature subset having a time attribute value for any user that is less than a time attribute value for the target feature subset of the user as a negative sample of training the target neural network model.
8. The method of claim 1, wherein the predetermined type of characteristic data comprises at least one of the following types of characteristic data including age, gender, exercise frequency, blood sugar, blood pressure, blood fat, drinking frequency, smoking frequency, medication characteristic data, image examination result, family history characteristic data, and past history characteristic data.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of handling the user features of any of claims 1 to 8 when the computer program is executed.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of processing user features according to any one of claims 1 to 8.

Description

User characteristic processing method, device and medium Technical Field The present invention relates to the field of electronic digital data processing technologies, and in particular, to a method, an apparatus, and a medium for processing user features. Background In the field of user state prediction, such as medical health, positive and negative training samples are usually extracted from historical feature data of a user to train a prediction model, and the extraction quality and efficiency of the positive and negative training samples directly determine the model landing effect and the iteration speed. In the traditional technical scheme, the extraction of the positive and negative training samples depends on manual operation for a long time, and the problems that the manual process is long in time consumption and low in manual extraction efficiency because the positive and negative training samples are manually screened from historical feature data of a large number of users exist, and in the process of manually extracting the positive and negative training samples, misoperation can occur, false labels are marked for the samples, the quality of the training samples is reduced, and the accuracy of a prediction model is affected. How to improve the efficiency of training sample extraction is a problem to be solved. Disclosure of Invention The invention aims to provide a processing method, equipment and medium for user characteristics so as to improve the efficiency of training sample extraction. According to a first aspect of the present invention, there is provided a method of processing a user feature, the method comprising the steps of: the method comprises the steps of obtaining a target user feature set, wherein the target user feature set comprises a plurality of feature sets of users, any feature set comprises a plurality of feature subsets, and any feature subset comprises a plurality of feature data with the same time attribute and of a preset type. For any user's feature set, feature subsets having time attribute values equal to, less than, and greater than the intermediate value are determined as the user's intermediate feature subset, the first feature subset, and the second feature subset, respectively. The method comprises the steps of selecting a first feature subset of a user from a first feature subset of the user, selecting a target feature subset of the user from a second feature subset of the user if a state label of the user corresponding to an intermediate feature subset of any user is a first preset label, selecting the target feature subset of the user from the first feature subset of the user if the state label of the user corresponding to the intermediate feature subset of any user is the first preset label, and selecting the state label of the user corresponding to a feature subset with a time attribute value smaller than that of the target feature subset as the second preset label if the state label of the user corresponding to the feature subset with a time attribute value smaller than that of the target feature subset is the first preset label. And if the screening is successful, determining the target feature subset of the user as a positive sample when training a target model, wherein the target model is used for acquiring the state label of the user. Further, screening the target feature subset of the user from the second feature subset of the user includes: and ordering the second feature subsets from small to large according to the time attribute values to obtain a first feature subset sequence. And taking a first preset value as a step length, and starting to judge from an nth feature subset with the smallest time attribute value in the first feature subset sequence, wherein n is the first preset value. If the state label of the user corresponding to the nth feature subset with the minimum value is the second preset label, continuing to judge the state label of the user corresponding to the (n-1) th feature subset with the minimum time attribute value in the feature subset sequence. And if the state label of the user corresponding to the smallest n-1 th feature subset is the second preset label, continuing to judge along the direction of the time attribute value becoming smaller until the state label of the user corresponding to a certain feature subset is judged to be the first preset label, and determining the next feature subset of the feature subset as the target feature subset. Further, the obtaining process of the first preset value includes: a difference between a maximum value and a minimum value of the temporal properties of the feature subset comprised by the second feature subset is obtained. And obtaining the average time interval of the time attribute values of the adjacent feature subsets included in the second feature subset according to the difference value and the number of the feature subsets included in the second feature subset. And acquiring the