CN-122021906-A - Data management screening method

CN122021906ACN 122021906 ACN122021906 ACN 122021906ACN-122021906-A

Abstract

The invention relates to a data management screening method which comprises the following steps of carrying out data preprocessing on the basis of data acquired in an application system and obtaining candidate data, respectively extracting three types of data features on the basis of current consultation information of a user and the candidate data, wherein the data features comprise semantic similarity features, behavior association features and scene matching degree features, constructing a matching degree calculation model on the basis of the data features, and calculating the matching degree of the current consultation information of the user and the candidate data. By integrating three types of features of semantic similarity, behavior association and scene matching, semantic matching can be combined, historical retrieval of a user and relevant reply of behavior scene matching can be combined, and further, the accuracy of reply is improved.

Inventors

FENG JINGYI

Assignees

河北雄安兴诺科技服务有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (9)

1. A data management screening method, comprising the steps of: step 1, preprocessing data based on multi-user retrieval data, browsing data, consultation data, corresponding manual consultation reply data and a reply template acquired in an application system, and obtaining candidate data; step 2, three types of data features are respectively extracted based on the current consultation information of the user and the candidate data, wherein the data features comprise semantic similarity features, behavior association features and scene matching features; Step 3, constructing a matching degree calculation model based on the data characteristics, and calculating the matching degree of the current consultation information of the user and the candidate data; Step 4, sorting based on the matching degree of the candidate data, and using a plurality of candidate data with high matching degree for outputting automatic reply; And 5, constructing an updating model based on the user satisfaction labels of the automatic reply contents, and updating the weighted summation model.
2. The data management screening method according to claim 1, wherein the extracting process of the semantic similarity features is as follows: Step 21, constructing a target domain word stock which contains the domain keywords, step 22, and carrying out current consultation information on the user And candidate data Conversion to TF-IDF vectors, respectively And Each element of each vector represents the weight of the corresponding keyword in the text; Step 23, calculating semantic similarity characteristics by adopting a cosine similarity formula The calculation formula is as follows: in the formula, Vector quantity And Is used for the dot product of (a), And Respectively is vector And A kind of electronic device Norms.
3. The method for data management and screening according to claim 2, wherein the process of extracting the behavior association features includes step 24 of defining a user history behavior set Each element in the user history behavior set is a user history search keyword or a keyword of browsing content; Step 25, statistics of candidate data Including a set of user historic behaviors Number of keywords in (B) Step 26, calculating the behavior association degree characteristics according to the following formula : 。
4. The data management screening method according to claim 3, wherein the extraction process of the scene matching degree features comprises the steps of constructing scene dimensions according to the consultation field, the consultation time and the user type respectively, and setting matching identification of a single scene dimension as step 28 If the user currently consults information And candidate data Matching in corresponding scene dimensions If not match Step 29, calculating scene matching degree characteristics through the following formula : 。
5. The method for data management screening according to any one of claims 1 to 4, wherein the calculation formula of the matching degree calculation model is as follows: In the formula, Currently consulting information for a user And candidate data The degree of matching between the two, Is marked with the reference number Is used for the data characteristics of the (c), Is marked with the reference number Is used for the data characteristics of the data, Is a bias term.
6. The data management screening method according to claim 5, wherein the update model has a calculation formula as follows: In the formula, For the number of iterations, In order for the rate of learning to be high, As a loss function For weight parameters Is used for the partial derivative of (a), As a loss function For bias items Is used for the partial derivative of (a), And The pre-update and post-update weight parameters, And The pre-update and post-update bias terms, respectively.
7. The data management screening method according to claim 6, wherein: the loss function The calculation formula of (2) is as follows: In the formula, To predict the loss function of the match versus actual match deviation, The number is noted for the user's satisfaction, In order to predict the degree of matching, Is the actual degree of matching.
8. A data management screening method as defined in claim 1, wherein said data preprocessing includes removal of invalid data and noise data and normalization of remaining data.
9. The method of claim 8, wherein the normalization process removes messy codes and nonsensical special symbols from the data text based on a unified text encoding format, and unifies the expressed text of the approximate keywords in each domain.

Description

Data management screening method Technical Field The invention relates to the technical field of data management, in particular to a data management screening method. Background With the popularization of internet applications, the demands for users to consult through application systems are increasing, for example, in the field of technological achievement transformation, users often consult patent transformation paths, rights and interests distribution, and obstetrics and research cooperation. In order to improve the reply efficiency, the prior art mostly adopts an automatic reply system, and realizes quick reply by matching with a historical consultation reply or a standard template. However, the application of the automatic reply system still has the following defects that most methods are matched only based on semantic similarity in the automatic reply, the relevance of behavior data such as historical retrieval and browsing of a user and a consultation scene is ignored, templates and reply information of the automatic reply are fixed templates which are set manually, dynamic optimization cannot be performed according to newly-added consultation data and user feedback, and the reply accuracy is poor. Disclosure of Invention (One) solving the technical problems The invention provides a data management screening method, which solves the problems in the background technology. (II) technical scheme In order to achieve the above purpose, the invention provides a data management screening method, which comprises the following steps: step 1, preprocessing data based on multi-user retrieval data, browsing data, consultation data, corresponding manual consultation reply data and a reply template acquired in an application system, and obtaining candidate data; step 2, three types of data features are respectively extracted based on the current consultation information of the user and the candidate data, wherein the data features comprise semantic similarity features, behavior association features and scene matching features; Step 3, constructing a matching degree calculation model based on the data characteristics, and calculating the matching degree of the current consultation information of the user and the candidate data; Step 4, sorting based on the matching degree of the candidate data, and using a plurality of candidate data with high matching degree for outputting automatic reply; And 5, constructing an updating model based on the user satisfaction labels of the automatic reply contents, and updating the weighted summation model. Preferably, the extracting process of the semantic similarity features is as follows: step 21, constructing a target domain word stock, wherein the target domain word stock contains the domain keywords; step 22, the current consultation information of the user And candidate dataConversion to TF-IDF vectors, respectivelyAndEach element of each vector represents the weight of the corresponding keyword in the text; Step 23, calculating semantic similarity characteristics by adopting a cosine similarity formula The calculation formula is as follows: In the formula, Vector quantityAndIs used for the dot product of (a),AndRespectively is vectorAndA kind of electronic deviceNorms. In a further preferred embodiment, the extraction process of the behavior association feature is as follows: Step 24, defining a user history behavior set Each element in the user history behavior set is a user history search keyword or a keyword of browsing content; Step 25, statistics of candidate data Including a set of user historic behaviorsNumber of keywords in (B); Step 26, calculating the behavior association degree characteristics according to the following formula: 。 In a further preferred embodiment, the extraction process of the scene matching degree features is as follows: Step 27, respectively constructing scene dimensions according to the consultation field, the consultation time and the user type; step 28, setting the matching identification of the single scene dimension as If the user currently consults informationAnd candidate dataMatching in corresponding scene dimensionsIf not match Step 29, calculating scene matching degree characteristics through the following formula: 。 In a further preferred embodiment, the calculation formula of the matching degree calculation model is as follows: In the formula, Currently consulting information for a userAnd candidate dataThe degree of matching between the two,Is marked with the reference numberIs used for the data characteristics of the (c),Is marked with the reference numberIs used for the data characteristics of the data,Is a bias term. In a further preferred embodiment, the calculation formula of the update model is as follows: In the formula, For the number of iterations,In order for the rate of learning to be high,As a loss functionFor weight parametersIs used for the partial derivative of (a),As a loss functionFor bias itemsIs used for the part