CN-122001384-A - Data compression method, device, equipment, medium and product

CN122001384ACN 122001384 ACN122001384 ACN 122001384ACN-122001384-A

Abstract

The application provides a data compression method, a device, equipment, a medium and a product, and relates to the field of data transmission. The method comprises the steps of obtaining transmission data of single-batch transmission, inquiring corresponding meta-information if the size of the transmission data is larger than a preset first compression threshold value, classifying fields in the transmission data to obtain classified data sets, screening according to a preset second compression threshold value to obtain compression target sets, collecting characteristic information of each compression target, calculating according to the preset characteristic sets and the characteristic information to obtain characteristic value sets of the compression targets, carrying out matching according to a preset algorithm matching strategy and the characteristic value sets to determine candidate compression algorithm sets of the compression targets, and determining an optimal compression algorithm of the compression targets from the candidate compression algorithm sets according to a preset algorithm selection strategy. The application solves the technical problems that the prior art is easily influenced by the characteristics of mixed data, and the compression algorithm is single in selection, so that the compression efficiency of the data is lower.

Inventors

ZHANG SHIYU
LI NAN

Assignees

中电科金仓(北京)科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251229

Claims (11)

1. A method of compressing data, comprising: Acquiring transmission data of single-batch transmission; if the size of the transmission data is larger than a preset first compression threshold value, inquiring corresponding meta information in a preset database according to the transmission data; Classifying fields in the transmission data according to the meta information to obtain a classified data set, wherein the classified data comprises classified field data of a plurality of preset field types; Screening the classified field data according to a preset second compression threshold value to obtain a compression target set, wherein the compression target set comprises at least one compression target; collecting characteristic information of each compression target, and calculating to obtain a characteristic value set of the compression target according to a preset characteristic set and the characteristic information; Performing matching processing with the characteristic value set according to a preset algorithm matching strategy to determine a candidate compression algorithm set of the compression target, wherein the candidate compression algorithm set comprises at least one compression algorithm; and determining an optimal compression algorithm of the compression target from the candidate compression algorithm set according to a preset algorithm selection strategy.
2. The method of claim 1, wherein the filtering the classification field data according to the preset second compression threshold to obtain the compression target set includes: if the size of the classified field data is not smaller than the preset second compression threshold value; and classifying the classified field data into the compressed target set as the compressed target, wherein the compressed target in the compressed target set is not less than one.
3. The method of claim 1, wherein the filtering the classification field data according to a preset second compression threshold to obtain a compression target set, further comprises: If the sizes of all the classified field data are smaller than the preset second compression threshold value; and taking the classified data set as the compression target and classifying the classified data set into the compression target set.
4. A method according to claim 3, wherein in each set of candidate compression algorithms, the compression algorithms are arranged in a predetermined order according to the predetermined algorithm selection strategy; the determining the optimal compression algorithm of the compression target from the candidate compression algorithm set according to a preset algorithm selection strategy comprises the following steps: Judging whether the compression processing carried out by the corresponding preset field type of the current compression target is a first batch or not; If the compression target is the first batch, carrying out the compression processing once on the compression target according to each compression algorithm in the candidate compression algorithm set to obtain a latest historical algorithm expression set; acquiring a weight set according to the preset algorithm selection strategy; Calculating to obtain a final score of each compression algorithm according to the weight set and the latest historical algorithm expression set; if the score value of each final score is not identical, screening out the highest score value from all the final scores as an optimal score; The compression algorithm corresponding to the optimal score is the optimal compression algorithm; if a plurality of optimal scores exist, selecting the compression algorithm arranged at the first position as the optimal compression algorithm; And updating the latest historical algorithm expression set to obtain the updated historical algorithm expression set of the first batch.
5. The method of claim 3, wherein the determining the optimal compression algorithm for the compression objective from the set of candidate compression algorithms according to a preset algorithm selection policy, further comprises: If the first batch is not the first batch, acquiring an updated historical algorithm expression set, a weight set and the optimal compression algorithm of the previous batch; Calculating according to the updated historical algorithm expression set, the weight set and the optimal compression algorithm of the previous batch to obtain a current final score; Acquiring the lowest expected score according to the preset algorithm selection strategy; comparing the current final score to the lowest expected score; if the current final score is not less than the lowest expected score, the current final score is used as an optimal score; the optimal compression algorithm of the previous batch is the optimal compression algorithm of the current batch; And updating the updated historical algorithm performance set of the previous batch to obtain the updated historical algorithm performance set of the current batch.
6. The method of claim 5, wherein the determining the optimal compression algorithm for the compression objective from the set of candidate compression algorithms according to a preset algorithm selection policy, further comprises: if the current final score is smaller than the lowest expected score, carrying out compression processing on the compression target once according to each compression algorithm in the candidate compression algorithm set to obtain a latest historical algorithm expression set of the current batch; Calculating the final score of each compression algorithm according to the weight set of the previous batch and the latest historical algorithm performance set of the current batch; Screening the highest score value from all the final scores as the optimal score; The compression algorithm corresponding to the optimal score is the optimal compression algorithm of the current batch; Updating the latest historical algorithm performance set of the current batch to obtain the updated historical algorithm performance set of the current batch.
7. The method of claim 6, wherein the predetermined set of features includes entropy features, highest repetition rate features, range of values features, variance features, and field average length features.
8. A data compression apparatus, comprising: The acquisition module is used for acquiring transmission data of single-batch transmission; the query module is used for querying corresponding meta information in a preset database according to the transmission data if the size of the transmission data is larger than a preset first compression threshold value; The classifying module is used for classifying the fields in the transmission data according to the meta information to obtain a classified data set, wherein the classified data comprises classified field data of a plurality of preset field types; the screening module is used for screening the classified field data according to a preset second compression threshold value to obtain a compression target set, wherein the compression target set comprises at least one compression target; The computing module is used for collecting the characteristic information of each compression target and computing to obtain a characteristic value set of the compression target according to a preset characteristic set and the characteristic information; The matching module is used for carrying out matching processing on the characteristic value set according to a preset algorithm matching strategy and determining a candidate compression algorithm set of the compression target, wherein the candidate compression algorithm set comprises at least one compression algorithm; and the determining module is used for determining the optimal compression algorithm of the compression target from the candidate compression algorithm set according to a preset algorithm selection strategy.
9. A data compression device is characterized by comprising a memory and a processor; The memory stores computer-executable instructions; The processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-7.

Description

Data compression method, device, equipment, medium and product Technical Field The present application relates to the field of data transmission, and in particular, to a method, apparatus, device, medium, and product for compressing data. Background In distributed database systems, cloud database services and large-scale data migration scenarios, the data transmission efficiency between the front end and the back end of the database or between nodes directly affects the overall performance of the system. Under the condition of insufficient bandwidth, the traditional data transmission mode is high in cost and limited in effect due to simple dependence on hardware upgrade or network optimization. At this time, reducing the transmission volume by data compression becomes a key means for improving the transmission efficiency. Currently, data compression is mainly achieved by pre-configuring a fixed compression algorithm for a specific type of data or by monitoring network delay or bandwidth fluctuations, dynamically adjusting the compression algorithm. However, the prior art is susceptible to mixed data characteristics, and the compression algorithm is chosen to be single, thereby resulting in lower data compression efficiency. Disclosure of Invention The application provides a data compression method, a device, equipment, a medium and a product, which are used for solving the problems that the prior art is easily influenced by the characteristics of mixed data, and the compression algorithm is single in selection, so that the compression efficiency of the data is lower. In a first aspect, the present application provides a data compression method, including: Acquiring transmission data of single-batch transmission; If the size of the transmission data is larger than a preset first compression threshold value, inquiring corresponding meta information in a preset database according to the transmission data; Classifying fields in the transmission data according to the meta information to obtain a classified data set, wherein the classified data comprises classified field data of a plurality of preset field types; Screening the classified field data according to a preset second compression threshold value to obtain a compression target set, wherein the compression target set comprises at least one compression target; collecting characteristic information of each compression target, and calculating to obtain a characteristic value set of the compression target according to a preset characteristic set and the characteristic information; Performing matching processing according to a preset algorithm matching strategy and a characteristic value set to determine a candidate compression algorithm set of a compression target, wherein the candidate compression algorithm set comprises at least one compression algorithm; And determining an optimal compression algorithm of the compression target from the candidate compression algorithm set according to a preset algorithm selection strategy. In one possible design, the filtering process is performed on the classified field data according to a preset second compression threshold value to obtain a compression target set, including: If the size of the classified field data is not smaller than a preset second compression threshold value; And classifying the classified field data into a compressed target set as a compressed target, wherein the number of the compressed targets in the compressed target set is not less than one. In one possible design, the filtering process is performed on the classified field data according to a preset second compression threshold value to obtain a compression target set, and the method further includes: If the sizes of all the classified field data are smaller than a preset second compression threshold value; and taking the classified data set as a compression target, and classifying the classified data set into the compression target set. In one possible design, in each candidate compression algorithm set, the compression algorithms are arranged according to a preset algorithm selection strategy and a preset sequence; according to a preset algorithm selection strategy, determining an optimal compression algorithm of a compression target from a candidate compression algorithm set, wherein the optimal compression algorithm comprises the following steps: Judging whether compression processing carried out by the corresponding preset field type of the current compression target is a first batch or not; if the compression target is the first batch, carrying out primary compression processing on the compression target according to each compression algorithm in the candidate compression algorithm set to obtain a latest historical algorithm expression set; acquiring a weight set according to a preset algorithm selection strategy; calculating to obtain the final score of each compression algorithm according to the weight set and the latest historical algori