US-20260127243-A1 - DATA PROCESSING METHOD AND RELATED DEVICE

US20260127243A1US 20260127243 A1US20260127243 A1US 20260127243A1US-20260127243-A1

Abstract

A data processing method and a related device are disclosed. The method includes: after a vector operator is obtained, determining whether the vector operator is capable of being converted into an equivalent matrix multiplication operator; and converting the vector operator into a corresponding target matrix multiplication operator if the vector operator is capable of being converted into the equivalent matrix multiplication operator, so that the matrix computation unit in the intelligent chip can execute the target matrix multiplication operator.

Inventors

Youhui Bai
Sen Wang
Huaman Zhou
Yifeng TANG
Gong Zhang
Yong Fu
Fuhua Li

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20251219
Priority Date: 20230621

Claims (12)

1 . A data processing method, wherein the method comprises: determining whether a vector operator meets a condition, wherein the vector operator is an operator executed by a vector computation unit of a chip; and converting the vector operator into a target matrix multiplication operator if the vector operator meets the condition, so that a matrix computation unit of the chip is capable of executing the target matrix multiplication operator, wherein a first computation result of the target matrix multiplication operator is the same as a second computation result of the vector operator.
2 . The method according to claim 1 , wherein determining whether the vector operator meets the condition comprises: determining whether the vector operator is a target vector operator, wherein the target vector operator is an operator that is capable of being converted into a matrix multiplication operator, and a computation result of the target vector operator is the same as a computation result of the converted matrix multiplication operator; and determining, if the vector operator is the target vector operator, that the vector operator meets the condition.
3 . The method according to claim 2 , wherein determining whether the vector operator meets the condition further comprises: determining whether a first cost is less than a second cost, wherein the first cost is predicted duration required for executing, by the matrix computation unit, the target matrix multiplication operator to complete computation, and the second cost is predicted duration required for executing, by the vector computation unit, the vector operator to complete computation; and determining, if the vector operator is the target vector operator, that the vector operator meets the condition comprises: if the vector operator is the target vector operator and the first cost is less than the second cost, determining that the vector operator meets the condition.
4 . The method according to claim 2 , wherein the target vector operator comprises at least one of row-wise matrix summation, column-wise matrix summation, scalar-matrix multiplication, a vector outer product operation, or a Hadamard product operation between a matrix and a vector.
5 . The method according to claim 1 , wherein the target matrix multiplication operator comprises a constructed matrix, and the constructed matrix is a matrix that is constructed based on the vector operator and that makes the first computation result and the second computation result the same.
6 . The method according to claim 5 , wherein the chip comprises a buffer connected to the matrix computation unit, the buffer is configured to store a block matrix of the constructed matrix, and converting the vector operator into the target matrix multiplication operator comprises: constructing a target block matrix based on a size of the buffer and the vector operator, wherein a size of the target block matrix is less than or equal to the size of the buffer, the target block matrix is a part of the constructed matrix, and not all values in the target block matrix are 0.
7 . A data processing method, wherein the method is applied to a chip, the chip comprises a vector computation unit and a matrix computation unit, and the method comprises: obtaining a target vector operator; converting the target vector operator into a target matrix multiplication operator, wherein a first computation result of the target matrix multiplication operator is the same as a second computation result of the target vector operator; and executing the target matrix multiplication operator by using the matrix computation unit.
8 . The method according to claim 7 , wherein the target matrix multiplication operator comprises a constructed matrix, and the constructed matrix is a matrix that is constructed based on the target vector operator and that makes the first computation result and the second computation result the same.
9 . The method according to claim 8 , wherein the chip comprises a buffer connected to the matrix computation unit, the buffer is configured to store a block matrix of the constructed matrix, and converting the target vector operator into the target matrix multiplication operator comprises: constructing a target block matrix based on a size of the buffer and the target vector operator, wherein a size of the target block matrix is less than or equal to the size of the buffer, the target block matrix is a part of the constructed matrix, and not all values in the target block matrix are 0.
10 . A data processing method, wherein the method comprises: determining whether a vector operator meets a condition, wherein the vector operator is an operator executed by a vector computation unit of a chip; and sending an instruction if the vector operator meets the condition, wherein the instruction instructs the chip to convert the vector operator into a matrix multiplication operator.
11 . The method according to claim 10 , wherein determining whether the vector operator meets the condition comprises: determining whether the vector operator is a target vector operator, wherein the target vector operator is an operator that is capable of being converted into a matrix multiplication operator, and a computation result of the target vector operator is the same as a computation result of the converted matrix multiplication operator; and determining, if the vector operator is the target vector operator, that the vector operator meets the condition.
12 . The method according to claim 11 , wherein determining whether the vector operator meets the condition further comprises: determining whether a first cost is less than a second cost, wherein the first cost is predicted duration required for executing, by the matrix computation unit, the target matrix multiplication operator to complete computation, and the second cost is predicted duration required for executing, by the vector computation unit, the vector operator to complete computation; and determining, if the vector operator is the target vector operator, that the vector operator meets the condition comprises: if the vector operator is the target vector operator and the first cost is less than the second cost, determining that the vector operator meets the condition.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2024/083524, filed on Mar. 25, 2024, which claims priority to Chinese Patent Application No. 202310746599.2, filed on Jun. 21, 2023. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This application relates to the field of artificial intelligence technologies, and in particular, to a data processing method and a related device. BACKGROUND In the context of big data and big computing, artificial intelligence technologies represented by machine learning develop rapidly, become a core foundation of key technologies such as computer vision, an intelligent voice, natural language processing, biometric feature recognition, and a recommendation system, are widely applied to fields such as financial risk control, medical diagnosis, and smart city, and gradually become one of major forces that promote information revolution and social development. Rapid development of artificial intelligence is attributed to two important factors: innovation of an algorithm model and continuous improvement of a computing capability of an intelligent chip. In the post-Moore era, although chip transistor density still continues to increase, it is very difficult to further improve power consumption density and performance density, which means that computational power cannot be improved through process improvement. Therefore, an important branch of chip development is a domain-specific architecture (DSA), also referred to as an intelligent chip. This type of chip is strongly dedicated and easy to design. Based on a specific application feature, although universality and flexibility are sacrificed for customization of a computation unit, simplification of control logic, and a design of a storage structure and a data channel that adapt to a domain computing feature, high performance and an energy efficiency ratio are achieved, and the chip is widely applied to fields such as high-performance computing, artificial intelligence, and cryptography. A core computation unit of the intelligent chip includes two parts: a matrix computation unit, configured to perform matrix multiplication computation; and a vector computation unit, configured to accelerate a vector-type operation. A design core of the intelligent chip is to accelerate a matrix multiplication operation. Therefore, the matrix computation unit occupies a large area on the intelligent chip. Compared with computational power of the vector computation unit, computational power of the matrix computation unit is improved by magnitude. For example, in some intelligent chips, a computational power ratio of the matrix computation unit to the vector computation unit reaches 100:1. However, computation of a neural network model is usually performed alternately on the two independent computation units. Consequently, vector-type computing with low computing complexity and a low computational power requirement becomes a bottleneck that restricts computational power of the intelligent chip. SUMMARY This application provides a data processing method and a related device, to resolve a problem that weak computational power of a vector computation unit of an intelligent chip restricts computational power of the intelligent chip. According to a first aspect, a data processing method is provided. The method includes: determining whether a vector operator meets a condition; and when the vector operator meets the condition, converting the vector operator into a target matrix multiplication operator that is capable of being executed by a matrix computation unit of a chip. The vector operator is an operator executed by a vector computation unit of the chip. A first computation result of the target matrix multiplication operator is the same as a second computation result of the vector operator. In other words, the target matrix multiplication operator is an equivalent operator of the vector operator, and a computation result before conversion of the vector operator is consistent with a computation result after conversion of the vector operator. When the vector operator meets the condition, the vector operator is converted into the target matrix multiplication operator, that is, the vector operator is converted into a data format that can be processed by the matrix computation unit, so that the converted target matrix multiplication operator can be operated on the matrix computation unit, thereby accelerating a vector-type operation and improving reasoning efficiency of a neural network model. In a possible implementation, determining whether the vector operator meets the condition includes: determining whether the vector operator is a target vector operator; and determining, if the vector operator is the target vector operator, that the vector operator meets the condition; or determining, if the vector operator is not th