CN-121367705-B - Information scheduling method, data compression transmission method and data decompression method

CN121367705BCN 121367705 BCN121367705 BCN 121367705BCN-121367705-B

Abstract

The application relates to an information scheduling method, a data compression transmission method and a data decompression method, which are characterized in that a hot mode list of a local cache is obtained, the hot mode list comprises a mode number and an identifier of a parser which are related to each other, the hot mode list is broadcast by a central scheduling system, the central scheduling system generates the hot mode list by tracking the parser with relatively higher occurrence frequency in the report records of all nodes recently, a target parser matched with an original message to be sent is determined from the hot mode list, a dynamic variable vector obtained after the original message is encoded by the target parser and a data packet is constructed according to the target mode number corresponding to the target parser, and the data packet is sent to a second node, so that the compression rate of data of an uncertainty structure is effectively improved.

Inventors

Xie Difan
ZENG LEI
LI XIAOJING
DAI WEIWEI
SHENG WEIFENG
WANG QI

Assignees

杭州高新区(滨江)区块链与数据安全研究院

Dates

Publication Date: 20260505
Application Date: 20251222

Claims (10)

1. A data compression transmission method, applied to a first node, the method comprising: The method comprises the steps of obtaining a hot mode list of a local cache, wherein the hot mode list comprises a mode number and an identifier of a resolver which are associated with each other, the hot mode list is broadcast by a central scheduling system, and the central scheduling system generates the hot mode list by tracking the resolver with the front occurrence frequency in the report records of all nodes recently; determining a target parser matched with an original message to be sent from the hot mode list, and constructing a data packet according to a dynamic variable vector obtained by encoding the original message by the target parser and a target mode number corresponding to the target parser; transmitting the data packet to a second node; The central scheduling system generates the hot mode list by tracking resolvers with the front occurrence frequency in the recently reported records of all nodes, wherein the central scheduling system allocates the mode numbers for identifiers of the resolvers with the front occurrence frequency in the recently reported records of all nodes, and generates the hot mode list according to the mode numbers and the identifiers, wherein the frequencies of the occurrence frequencies of the identifiers in the recently reported records are arranged in descending order, the first K identifiers are selected to construct a hot resolver set, and the mode numbers are globally unique and extremely short integer identifiers.
2. The data compression transmission method according to claim 1, wherein determining a target parser matching an original message to be transmitted from the hot mode list comprises: For each mode number in the hot mode list, acquiring an associated parser thereof, and calling the parser to encode an original message to be sent; judging whether the encoding is successful or not, and taking the resolver which is successful in encoding as the target resolver.
3. The data compression transmission method according to claim 1, characterized in that the method further comprises: Invoking a pre-training language model to infer a local data stream and generating a parser, wherein the encoder is used for mapping an original message into a dynamic variable vector, and the decoder is used for mapping the dynamic variable vector back to the original message; and reporting the current time as reporting time and the identifier of the resolver as a record to the central dispatching system.
4. A method of data decompression, for use with a second node, the method comprising: receiving a data packet sent by a first node, and extracting a target mode number and a dynamic variable vector from the data packet; The method comprises the steps of obtaining a hot mode list of a local cache, and searching an identifier of a resolver corresponding to a target mode number from the hot mode list, wherein the hot mode list is broadcast by a central scheduling system, and the central scheduling system generates the hot mode list by tracking the resolver with the front occurrence frequency in the report records of all nodes recently; according to the identifier, calling the parser to decode the dynamic variable vector to obtain an original message; The central scheduling system generates the hot mode list by tracking resolvers with the front occurrence frequency in the recently reported records of all nodes, wherein the central scheduling system allocates the mode numbers for identifiers of the resolvers with the front occurrence frequency in the recently reported records of all nodes, and generates the hot mode list according to the mode numbers and the identifiers, wherein the frequencies of the occurrence frequencies of the identifiers in the recently reported records are arranged in descending order, the first K identifiers are selected to construct a hot resolver set, and the mode numbers are globally unique and extremely short integer identifiers.
5. An information scheduling method, applied to a central scheduling system, comprising: Acquiring records reported by each node, wherein each record comprises an identifier of a resolver; Screening recently reported records from the records, and constructing a hot parser set according to identifiers of parsers with the front occurrence frequency; Allocating a pattern number to each identifier in the hot resolver set, and generating a hot pattern list according to the pattern number and the associated identifier; broadcasting the hot mode list to each node in the distributed network system to instruct each node to compress an original message to be sent according to the data compression transmission method as claimed in claim 1 or instruct each node to decompress a received data packet according to the data decompression method as claimed in claim 4.
6. The information scheduling method of claim 5, wherein the parser is generated by the nodes by reasoning about local data streams through a pre-trained language model, the parser comprising an encoder and a decoder; wherein the encoder is configured to map an original message to a dynamic variable vector, and the decoder is configured to map the dynamic variable vector back to the original message.
7. The information scheduling method according to claim 5, wherein screening recently reported records from the records, constructing a hot parser set according to a parser with a top frequency of occurrence, comprises: according to the reporting time carried in each record, determining a history record which is within a time window from the current moment, and taking the history record as the recently reported record; and arranging the occurrence frequencies of the identifiers in the recently reported records in a descending order, selecting the first K identifiers, and constructing the hot parser set.
8. The data synchronization method is characterized by being applied to a distributed network system, wherein the distributed network system comprises a central dispatching system and a plurality of nodes, and the method comprises the following steps: The method comprises the steps that a central scheduling system acquires records reported by all nodes, wherein each record comprises identifiers of resolvers, the central scheduling system screens recently reported records from the records, a hot resolver set is constructed according to identifiers of resolvers with the front occurrence frequency, the central scheduling system distributes pattern numbers for all the identifiers in the hot resolver set and generates a hot pattern list according to the pattern numbers and the associated identifiers, the central scheduling system broadcasts the hot pattern list to all the nodes in a distributed network system, the frequencies of all the identifiers in the recently reported records are arranged in a descending order, the top K identifiers are selected to construct the hot resolver set, the resolvers comprise encoders and decoders, the identifiers are calculated according to codes of the resolvers, and the pattern numbers are globally unique and extremely short integer identifiers; A first node acquires the hot mode list of a local cache, determines a target parser matched with an original message to be sent from the hot mode list, constructs a data packet according to a dynamic variable vector obtained by encoding the original message by the target parser and a target mode number corresponding to the target parser, and sends the data packet to a second node; And after the second node receives the data packet, extracting the target mode number and the dynamic variable vector from the data packet, acquiring the hot mode list of the local cache by the second node, searching an identifier of a resolver corresponding to the target mode number from the hot mode list, and calling the resolver to decode the dynamic variable vector according to the identifier by the second node to obtain the original message.
9. A distributed network system, comprising a central scheduling system and a plurality of nodes, wherein the central scheduling system is in communication connection with each node, and at least two nodes are in communication connection with each other, and the distributed network system is used for executing the data synchronization method according to claim 8.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 8.

Description

Information scheduling method, data compression transmission method and data decompression method Technical Field The present application relates to the field of distributed network data transmission, and in particular, to an information scheduling method, a data compression transmission method, and a data decompression method. Background The conventional data compression method mainly includes a general lossless compression method and a compression method based on a predefined structure (Schema). Among other things, general lossless compression methods, such as Gzip, zstd, LZ, 4, etc., compress by looking for repeated byte sequences in the data stream. Compression methods based on predefined structures, such as JSON, protobuf, avro, etc., require that the data follow exactly one predefined format or pattern. In the scenes of log transmission, real-time data synchronization and the like, a data stream to be transmitted (such as a large amount of log information) is unstructured or semi-structured on the surface, the interior of the data stream is developed along with time or events, and a hidden and complex internal structure is presented, so that the data stream cannot be described by a deterministic grammar (such as a fixed Schema or a regular expression). Conventional data compression methods have difficulty in efficiently compressing such data. Currently, no effective solution has been proposed for the problem that data containing uncertainty structures is difficult to compress effectively. Disclosure of Invention In view of the foregoing, it is desirable to provide an information scheduling method, a data compression transmission method, and a data decompression method that can improve the data compression rate for data including an uncertainty structure. In a first aspect, the present application provides a data compression transmission method, applied to a first node, the method comprising: The method comprises the steps of obtaining a hot mode list of a local cache, wherein the hot mode list comprises a mode number and an identifier of a parser which are associated with each other, the hot mode list is broadcast by a central scheduling system, and the central scheduling system generates the hot mode list by tracking the parser with relatively higher occurrence frequency in the report records of all nodes recently; determining a target parser matched with an original message to be sent from the hot mode list, and constructing a data packet according to a dynamic variable vector obtained by encoding the original message by the target parser and a target mode number corresponding to the target parser; And sending the data packet to a second node. In one embodiment, determining a target parser matching an original message to be sent from the hot mode list includes: For each mode number in the hot mode list, acquiring an associated parser thereof, and calling the parser to encode an original message to be sent; judging whether the encoding is successful or not, and taking the resolver which is successful in encoding as the target resolver. In one embodiment, the method further comprises: Invoking a pre-training language model to infer a local data stream, and generating a parser, wherein the parser comprises an encoder and a decoder, the encoder is used for mapping an original message into a dynamic variable vector, and the decoder is used for mapping the dynamic variable vector back to the original message; calculating the identifier according to the code of the resolver; and reporting the current time as reporting time and the identifier of the resolver as a record to the central dispatching system. In a second aspect, the present application provides a data decompression method applied to a second node, the method comprising: receiving a data packet sent by a first node, and extracting a target mode number and a dynamic variable vector from the data packet; The method comprises the steps of obtaining a hot mode list of a local cache, and searching an identifier of a resolver corresponding to a target mode number from the hot mode list, wherein the hot mode list is broadcast by a central scheduling system, and the central scheduling system generates the hot mode list by tracking resolvers with relatively higher occurrence frequencies in recent report records of all nodes; And according to the identifier, calling the parser to decode the dynamic variable vector to obtain an original message. In a third aspect, the present application provides an information scheduling method, applied to a central scheduling system, where the method includes: Acquiring records reported by each node, wherein each record comprises an identifier of a resolver; Screening recently reported records from the records, and constructing a hot parser set according to identifiers of parsers with relatively high occurrence frequencies; Allocating a pattern number to each identifier in the hot resolver set, and generating a hot patter