Search

CN-121996792-A - Intelligent data asset management and optimization distribution method and system

CN121996792ACN 121996792 ACN121996792 ACN 121996792ACN-121996792-A

Abstract

The invention provides an intelligent data asset management and optimization distribution method and system, which comprise the steps of encoding mobile user data, carrying out data cleaning on the encoded mobile user data to obtain the data-cleaned mobile user data, calculating a weight value of each word in the data-cleaned mobile user data, converting the data-cleaned mobile user data into word vectors, giving corresponding weight values to each word vector to form training samples, inputting the training samples into a neural network to carry out training to obtain a data asset distribution model, and completing distribution of target mobile user data by using the data asset distribution model. When the training sample is constructed, the weight of each word is defined by combining the position information of the word and the TF-IDF information in order to improve the feature extraction capability of the neural network, so that the neural network model learns the features of the words with high weight values, and the classification process of different texts is completed better.

Inventors

  • ZHU HAIPING
  • XU JING
  • LIU QINGQUAN
  • LI XIN
  • DU YANJUN

Assignees

  • 嘉兴大学

Dates

Publication Date
20260508
Application Date
20231226

Claims (10)

  1. 1. An intelligent data asset management and optimization distribution method, comprising: step 1, acquiring mobile user data; step 2, coding the mobile user data to obtain coded mobile user data; Step 3, data cleaning is carried out on the coded mobile user data to obtain the mobile user data after data cleaning; Step 4, calculating the weight value of each word in the data of the mobile user after data cleaning; Step 5, converting the data of the mobile user after data cleaning into word vectors, and endowing each word vector with a corresponding weight value to form a training sample; step 6, inputting the training sample into a neural network for training to obtain a data asset allocation model; and 7, completing the distribution of the target mobile user data by using a data asset distribution model.
  2. 2. The method for intelligent data asset management and optimization distribution as set forth in claim 1, wherein the step 2 of encoding the mobile user data to obtain the encoded mobile user data comprises: and encoding the mobile user data by using a unicode encoding method to obtain encoded mobile user data, wherein each piece of encoded mobile user data is provided with a set formed by m character elements.
  3. 3. The method for intelligent data asset management and optimization distribution according to claim 2, wherein said step 3 of performing data cleansing on said encoded mobile user data to obtain data-cleansed mobile user data comprises: step 3.1, calculating the characteristic value of each piece of data in the coded mobile user data; step 3.2, carrying out normalization processing on the characteristic value of each piece of data to obtain a normalized data characteristic value; and 3.3, removing the mobile user data corresponding to the data characteristic value in the abnormal value interval as the abnormal value to obtain the mobile user data after data cleaning.
  4. 4. The intelligent data asset management and optimization method according to claim 3, wherein said step 3.1 of calculating a characteristic value of each piece of encoded mobile user data comprises: The formula is adopted: And calculating the characteristic value of each piece of data in the encoded mobile user data, wherein H (X) represents the characteristic value of the X-th piece of data, and p i represents the probability of the i-th character element in the corresponding set.
  5. 5. The method for intelligent data asset management and optimization assignment as set forth in claim 4, wherein said step 4 of calculating a weight value for each word in the data-purged mobile subscriber data comprises: Step 4.1, quantifying the position information of each word in the data of the mobile user after the data cleaning to obtain a position information statistical formula; step 4.2, obtaining TF-IDF value of each word in the data of the mobile user after data cleaning; And 4.3, optimizing a position information statistical formula by using the TF-IDF value based on a normal distribution principle to obtain a weight value of each word.
  6. 6. The intelligent data asset management and optimization method as claimed in claim 5, wherein said location information statistical formula is: Where λ represents an adjustable parameter, f i represents a position where the word i First appears in the corresponding text, n j represents a total word number of the text j, and First (i, j) represents position information of the word i.
  7. 7. The method for intelligent data asset management and optimization distribution as claimed in claim 6, wherein the step 4.3 of optimizing the position information statistical formula using the TF-IDF value based on the normal distribution principle to obtain the weight value of each word comprises: The formula is adopted: LTFIDF(i,j)=TF-IDF(i,j)×Local(i,j) And obtaining a weight value of each word, wherein LTFIDF (i, j) represents the weight value of the ith word in the text j, and TF-IDF (i, j) represents the TF-IDF value of the word i.
  8. 8. The method for intelligent data asset management and optimization distribution as set forth in claim 7, wherein the step 5 of converting the data-cleaned mobile user data into word vectors and assigning a corresponding weight value to each word vector to form training samples comprises: converting each Word in the data-cleaned mobile user data into a Word vector by using a Word2vec model Then use the formula Will correspond to the word vector And combining the weight values of the words to form a training sample.
  9. 9. An intelligent data asset management and optimization distribution system, comprising: the mobile user data acquisition module is used for acquiring mobile user data; The coding module is used for coding the mobile user data to obtain coded mobile user data; the data cleaning module is used for carrying out data cleaning on the coded mobile user data to obtain the mobile user data after data cleaning; the weight value calculation module is used for calculating the weight value of each word in the data of the mobile user after data cleaning; the training sample construction module is used for converting the data of the mobile user after data cleaning into word vectors, and endowing each word vector with a corresponding weight value to form a training sample; the training module is used for inputting the training sample into a neural network to train to obtain a data asset allocation model; And the automatic allocation module is used for completing the allocation of the target mobile user data by using the data asset allocation model.
  10. 10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor performs the steps of a method of intelligent data asset management and optimization allocation according to any of claims 1-8.

Description

Intelligent data asset management and optimization distribution method and system Technical Field The invention belongs to the technical field of data asset management, and particularly relates to an intelligent data asset management and optimal distribution method and system. Background With the rapid development of information technology, businesses and organizations have accumulated a large number of data assets. These data assets contain rich information and value, but often fail to fully exploit their potential due to improper management and distribution. To address this problem, data asset management and allocation methods have evolved. However, conventional data asset management and distribution methods typically extract metadata information for corresponding data assets, and then enterprise administrators set corresponding tags for different types of data based on the extracted metadata information, and store the corresponding data assets in a centralized data management system based on the types of tags. Therefore, the existing data asset management and distribution method can only rely on enterprise management personnel to set corresponding labels for different types of data, so that the efficiency is low, unreasonable distribution of data resources is easily caused, and the utilization efficiency and value of the data assets are reduced. Disclosure of Invention In order to solve the above problems, an object of the present invention is to provide a method and a system for managing and optimizing distribution of intelligent data assets. An intelligent data asset management and optimization distribution method, comprising: step 1, acquiring mobile user data; step 2, coding the mobile user data to obtain coded mobile user data; Step 3, data cleaning is carried out on the coded mobile user data to obtain the mobile user data after data cleaning; Step 4, calculating the weight value of each word in the data of the mobile user after data cleaning; Step 5, converting the data of the mobile user after data cleaning into word vectors, and endowing each word vector with a corresponding weight value to form a training sample; step 6, inputting the training sample into a neural network for training to obtain a data asset allocation model; and 7, completing the distribution of the target mobile user data by using a data asset distribution model. Preferably, the step 2 of encoding the mobile user data to obtain encoded mobile user data includes: and encoding the mobile user data by using a unicode encoding method to obtain encoded mobile user data, wherein each piece of encoded mobile user data is provided with a set formed by m character elements. Preferably, the step 3 of performing data cleansing on the encoded mobile user data to obtain the data-cleansed mobile user data includes: step 3.1, calculating the characteristic value of each piece of data in the coded mobile user data; step 3.2, carrying out normalization processing on the characteristic value of each piece of data to obtain a normalized data characteristic value; and 3.3, removing the mobile user data corresponding to the data characteristic value in the abnormal value interval as the abnormal value to obtain the mobile user data after data cleaning. Preferably, the step 3.1 of calculating the characteristic value of each piece of the encoded mobile user data includes: The formula is adopted: And calculating the characteristic value of each piece of data in the encoded mobile user data, wherein H (X) represents the characteristic value of the X-th piece of data, and p i represents the probability of the i-th character element in the corresponding set. Preferably, the step 4 of calculating the weight value of each word in the data-cleaned mobile user data includes: Step 4.1, quantifying the position information of each word in the data of the mobile user after the data cleaning to obtain a position information statistical formula; step 4.2, obtaining TF-IDF value of each word in the data of the mobile user after data cleaning; And 4.3, optimizing a position information statistical formula by using the TF-IDF value based on a normal distribution principle to obtain a weight value of each word. Preferably, the statistical formula of the location information is: Where λ represents an adjustable parameter, f i represents a position where the word i First appears in the corresponding text, n j represents a total word number of the text j, and First (i, j) represents position information of the word i. Preferably, the step 4.3 of optimizing the position information statistical formula by using the TF-IDF value based on the normal distribution principle to obtain the weight value of each word comprises the following steps: The formula is adopted: LTFIDF(i,j)=TF-IDF(i,j)×Local(i,j) And obtaining a weight value of each word, wherein LTFIDF (i, j) represents the weight value of the ith word in the text j, and TF-IDF (i, j) represents the TF-IDF value of th