EP-4398096-B1 - VECTOR DATA COMPRESSION METHOD, VECTOR DATA DECOMPRESSION METHOD, APPARATUS, AND DEVICE

EP4398096B1EP 4398096 B1EP4398096 B1EP 4398096B1EP-4398096-B1

Inventors

REN, Zimu
LI, DONGSHENG

Dates

Publication Date: 20260506
Application Date: 20230215

Claims (15)

A method for compressing vector data, the method being performed by a processor (200), characterized in that the processor (200) comprises a source vector register (201), n sets of multiplexers, a data merging apparatus (202), and a target vector register (203), n being an integer greater than 1; and the method comprising: storing, by the source vector register (201), source vector data, the source vector data being divided into n source sub-vectors, the n source sub-vectors being in a one-to-one correspondence with the n sets of multiplexers (401); selectively arranging, by an i th set of multiplexers in the n sets of multiplexers, valid elements in an i th source sub-vector in the source vector data, to obtain an i th target sub-vector corresponding to the i th source sub-vector, valid elements in the i th target sub-vector being located at a header of the i th target sub-vector, and i being a positive integer less than or equal to n, and an element in a boolean vector corresponding to the source vector data being used for indicating a distribution of the valid elements in the source vector data (402); shifting and merging, by the data merging apparatus (202), n target sub-vectors corresponding to the n source sub-vectors, to obtain target vector data, valid elements in the target vector data being located at a header of the target vector data (403); and storing, by the target vector register (203), the valid elements in the target vector data (404).
The method according to claim 1, wherein the i th source sub-vector comprises x elements, the x elements comprising y valid elements, and the i th set of multiplexers comprising x-1 multiplexers of different types, x being a positive integer, and y being a positive integer less than or equal to x-1; and the selectively arranging, by an i th set of multiplexers in the n sets of multiplexers, valid elements in an i th source sub-vector in the source vector data, to obtain an i th target sub-vector corresponding to the i th source sub-vector comprises: selecting, by y multiplexers in the x-1 multiplexers, the y valid elements from the x elements in ascending order of bits based on the boolean vector corresponding to the source vector data, and arranging the y valid elements in ascending order of bits, to obtain the i th target sub-vector.
The method according to claim 2, wherein the selecting, by y multiplexers in the x-1 multiplexers, the y valid elements from the x elements in ascending order of bits based on a boolean vector corresponding to the source vector data, and arranging the y valid elements in ascending order of bits, to obtain the i th target sub-vector comprises: selecting, by a z th multiplexer in the y multiplexers, a z th valid element from a z th element to an x th element in the i th source sub-vector in ascending order of bits based on the boolean vector, z being a positive integer less than or equal to y; and adding, by the z th multiplexer, the z th valid element to a z th position of the i th target sub-vector.
The method according to claim 1, wherein the data merging apparatus (202) comprises m sets of data merging units, the m sets of data merging units being configured to perform p rounds of shifting and merging on the n target sub-vectors to obtain the target vector data, m being an integer greater than 1, and p being a positive integer; and the shifting and merging, by the data merging apparatus (202), n target sub-vectors corresponding to the n source sub-vectors, to obtain target vector data comprise: shifting and merging, by a q th set of data merging units in the m sets of data merging units for a q th round of shifting and merging, a q th set of to-be-merged vectors, to obtain a q th set of merged vectors, q being a positive integer less than or equal to p; and in a case that q is equal to 1, the q th set of to-be-merged vectors being the n target sub-vectors, and in a case that q is greater than 1, the q th set of to-be-merged vectors being a (q-1) th set of merged vectors, and a p th set of merged vectors being the target vector data.
The method according to claim 4, wherein the shifting and merging, by a q th set of data merging units in the m sets of data merging units, a q th set of to-be-merged vectors, to obtain a q th set of merged vectors comprise: shifting and merging, by the data merging units in the q th set of data merging units, each two adjacent to-be-merged vectors in the q th set of to-be-merged vectors, to obtain the q th set of merged vectors.
The method according to claim 5, wherein in a case that a first data merging unit in the q th set of data merging units shifts and merges a first to-be-merged vector and a second to-be-merged vector in the q th set of to-be-merged vectors, the shifting and merging, by the data merging units in the q th set of data merging units, each two adjacent to-be-merged vectors in the q th set of to-be-merged vectors, to obtain the q th set of merged vectors comprise: shifting, by the first data merging unit, the first to-be-merged vector based on the second to-be-merged vector, to obtain an adjusted first to-be-merged vector; and merging, by the first data merging unit, the adjusted first to-be-merged vector and the second to-be-merged vector, to obtain a first merged vector corresponding to the q th set of merged vectors, an element corresponding to the second to-be-merged vector in the source vector data being at a lower bit than an element corresponding to the first to-be-merged vector in the source vector data.
The method according to claim 6, wherein the shifting, by the first data merging unit, the first to-be-merged vector based on the second to-be-merged vector, to obtain an adjusted first to-be-merged vector comprises: filling, by the first data merging unit, the first to-be-merged vector with elements based on a quantity of elements in the second to-be-merged vector, to obtain a filled first to-be-merged vector, a quantity of elements in the filled first to-be-merged vector being a sum of the quantity of the elements in the second to-be-merged vector and a quantity of elements in the first to-be-merged vector; and shifting, by the first data merging unit, non-filling elements in the filled first to-be-merged vector as a whole based on a quantity of invalid elements in the second to-be-merged vector, to obtain the adjusted first to-be-merged vector, a quantity of non-filling elements corresponding to a header of the adjusted first to-be-merged vector being the same as the quantity of the valid elements in the second to-be-merged vector.
The method according to claim 6, wherein the merging, by the first data merging unit, the adjusted first to-be-merged vector and the second to-be-merged vector, to obtain a first merged vector corresponding to the q th set of merged vectors comprises: selecting, by the first data merging unit, elements corresponding to the first merged vector from the adjusted first to-be-merged vector and the second to-be-merged vector in ascending order of bits; and selecting, by the first data merging unit for a k th element corresponding to the first merged vector, one of a k th element in the adjusted first to-be-merged vector and a k th element in the second to-be-merged vector as the k th element corresponding to the first merged vector, k being a positive integer.
The method according to claim 8, wherein the selecting, by the first data merging unit, one of a k th element in the adjusted first to-be-merged vector and a k th element in the second to-be-merged vector as the k th element corresponding to the first merged vector comprises: determining, by the first data merging unit, the k th element in the adjusted first to-be-merged vector as the k th element corresponding to the first merged vector in a case that the k th element in the adjusted first to-be-merged vector is a valid element; or determining, by the first data merging unit, the k th element in the second to-be-merged vector as the k th element corresponding to the first merged vector in a case that the k th element in the second to-be-merged vector is a valid element.
A method for decompressing vector data obtained by the method according claim 1, the method being performed by a processor (900), characterized in that the processor (900) comprises a target vector register (901), a data splitting apparatus (902), and n sets of multiplexers, n being an integer greater than 1; and the method comprising: storing, by the target vector register (901), target vector data, valid elements in the target vector data being located at a header of the target vector data , and an element in a boolean vector corresponding to the source vector data being used for indicating a distribution of the valid elements in the source vector data (1101); shifting and splitting, by the data splitting apparatus (902), the target vector data, to obtain n target sub-vectors, valid elements in each of the target sub-vectors being located at a header of the target sub-vector (1102); and respectively decompressing, by the n sets of multiplexers, the n target sub-vectors, to obtain n source sub-vectors, the n source sub-vectors being configured to be combined to obtain source vector data based on the boolean vector (1103); wherein the data splitting apparatus (902) comprises m sets of data splitting units, the m sets of data splitting units being configured to perform p rounds of shifting and splitting on the target vector data to obtain the n target sub-vectors, m being an integer greater than 1, and p being a positive integer; and the shifting and splitting, by the data splitting apparatus (902), the target vector data, to obtain n target sub-vectors comprise: shifting and splitting, by a q th set of data splitting units in the m sets of data splitting units for a q th round of shifting and splitting, a q th set of to-be-split vectors, to obtain a q th set of split vectors, q being a positive integer less than or equal to p; and in a case that q is equal to 1, the q th set of to-be-split vectors being the target vector data, and in a case that q is greater than 1, the q th set of to-be-split vectors being a (q-1) th set of split vectors, and a p th set of split vectors being the n target sub-vectors.
The method according to claim 10, wherein the q th set of split vectors comprise s split vectors corresponding to a first to-be-split vector in the q th set of to-be-split vectors, s being an integer greater than 1; and in a case that a first data splitting unit in the q th set of data splitting units shifts and splits the first to-be-split vector, the s split vectors corresponding to the first to-be-split vector are obtained in the following manner: determining, by the first data splitting unit, s split element quantities corresponding to the first to-be-split vector; determining, by the first data splitting unit, s sets of valid split elements based on the boolean vector corresponding to the target vector data and the s split element quantities; respectively shifting, by the first data splitting unit, the s sets of valid split elements as a whole in the first to-be-split vector based on the s sets of valid split elements, to obtain a shifted first to-be-split vector; and splitting, by the first data splitting unit, the shifted first to-be-split vector based on the s split element quantities, to obtain s split vectors corresponding to the first to-be-split vector.
The method according to claim 11, wherein the respectively shifting, by the first data splitting unit, the s sets of valid split elements as a whole in the first to-be-split vector based on the s sets of valid split elements, to obtain a shifted first to-be-split vector comprises: determining, by the first data splitting unit for a target split element quantity in the s split element quantities, a quantity of target to-be-shifted bits corresponding to the target split element quantity based on a difference between a position of a target valid split element corresponding to the target split element quantity in the first to-be-split vector and a position of the target split element corresponding to the target split element quantity in the boolean vector; shifting, by the first data splitting unit, valid split elements in the first to-be-split vector corresponding to the target split element quantity as a whole based on the quantity of target to-be-shifted bits corresponding to the target split element quantity, to obtain an intermediate first to-be-split vector; and further shifting, by the first data splitting unit, the intermediate first to-be-split vector based on quantities of target to-be-shifted bits respectively corresponding to remaining split element quantities, to obtain the shifted first to-be-split vector, the target valid split element corresponding to the target split element quantity being the last valid split element corresponding to the target split element quantity in descending order of bits, and the target split element corresponding to the target split element quantity being the last split element corresponding to the target split element quantity in descending order of bits.
The method according to claim 11, wherein the s split vectors comprise a target split vector corresponding to a target split element quantity in the s split element quantities; and the splitting, by the first data splitting unit, the shifted first to-be-split vector based on the s split element quantities, to obtain the s split vectors corresponding to the first to-be-split vector comprises: determining, by the first data splitting unit, a region corresponding to the target split vector corresponding to the target split element quantity in the boolean vector based on the target split element quantity; determining, by the first data splitting unit, a target region corresponding to the target split vector in the shifted first to-be-split vector based on the region corresponding to the target split vector in the boolean vector; and determining, by the first data splitting unit, an element in the target region as an element of the target split vector.
A computer device, comprising a processor and a memory, the memory storing at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by the processor to cause the computer device to implement the method for compressing vector data according to any one of claims 1 to 9, or implement the method for decompressing vector data according to any one of claims 10 to 13.
A non-volatile computer-readable storage medium, storing at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by a processor to cause a computer to implement the method for compressing vector data according to any one of claims 1 to 9 or implement the method for decompressing vector data according to any one of claims 10 to 13.

Description

RELATED APPLICATION This application claims to Chinese Patent Application No. 202210312611.4, entitled "METHOD AND APPARATUS FOR COMPRESSING VECTOR DATA, METHOD AND APPARATUS FOR DECOMPRESSING VECTOR DATA, AND DEVICE" filed on March 28, 2022. FIELD OF THE TECHNOLOGY This application relates to the field of data processing technologies, and in particular, to a method and an apparatus for compressing vector data, a method and an apparatus for decompressing vector data, and a device. BACKGROUND OF THE DISCLOSURE Currently, some mainstream artificial intelligence (AI) processors in the industry are designed with a data compression instruction, which may be used for accelerating the inference and training efficiency of the AI processors. An implementation of the data compression instruction has significant impact on the AI processors. In related art, the data compression instruction is implemented through direct compression of vector data by using a set of multiplexers (MUX). For example, during compression of vector data with 16 elements, 15 MUXes need to be arranged. The 15 MUXes are successively 16-1 MUX, 15-1 MUX, ..., and 2-1 MUX in ascending order of bits. US 2018/0309461Al refers to "apparatus and method for vector compression". SUMMARY Embodiments of this application provide a method and an apparatus for compressing vector data, a method and an apparatus for decompressing vector data, and a device, which can reduce a congestion level of wire required for implementing a data compression instruction in an AI processor, and reduce an area of the AI processor. The technical solutions may include the following contents. According to an aspect of the embodiments of this application, a method for compressing vector data is provided, the method being executed by a processor, the processor including a source vector register, n sets of multiplexers, a data merging apparatus, and a target vector register, n being an integer greater than 1. The method is defined in appended set of claims. According to an aspect of the embodiments of this application, a method for decompressing vector data is provided, the method being executed by a processor, the processor including a target vector register, a data splitting apparatus, and n sets of multiplexers, n being an integer greater than 1. The method is defined in appended set of claims. According to an aspect of the embodiments of this application, a processor is provided, the processor including a source vector register, n sets of multiplexers, a data merging apparatus, and a target vector register, each set of multiplexers including at least two multiplexers, and n being an integer greater than 1; the source vector register including n sets of output ports, the n sets of output ports being respectively connected to input ports of the n sets of multiplexers;output ports of the n sets of multiplexers being connected to an input port of the data merging apparatus; andan input port of the target vector register being connected to an output port of the data merging apparatus. According to an aspect of the embodiments of this application, a processor is provided, the processor including a target vector register, a data splitting apparatus, and n sets of multiplexers, each set of multiplexers including at least two multiplexers, and n being an integer greater than 1; an output port of the target vector register being connected to an input port of the data splitting apparatus; andan output port of the data splitting apparatus being connected to input ports of the n sets of multiplexers. According to an aspect of the embodiments of this application, a computer device is provided, the computer device including a processor and a memory, the memory storing at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by the processor to cause the computer device to implement the foregoing method for compressing vector data, or implement the foregoing method for decompressing vector data. According to an aspect of the embodiments of this application, a non-volatile computer-readable storage medium is provided, the non-volatile computer-readable storage medium storing at least one instruction, at least one program, a code set, or an instruction set, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by a processor to cause a computer to implement the foregoing method for compressing vector data, or implement the foregoing method for decompressing vector data. According to an aspect of the embodiments of this application, a computer program product or a computer program is provided, the computer program product or the computer program including computer instructions, the computer instructions being stored in a non-volatile computer-readable storage medium. A processor of a computer