US-12619437-B1 - Method and system for in-line data conversion outside of a machine learning hardware

US12619437B1US 12619437 B1US12619437 B1US 12619437B1US-12619437-B1

Abstract

A system includes a component configured to send data in a first data format. The system includes a direct memory access (DMA) engine configured to receive the data in the first data format and convert the first data format to a second data format, wherein the second data format is associated with a data format of a machine learning (ML) hardware, wherein the second data format is different from the first data format. The ML hardware is configured to receive the data in the second format and perform at least one ML operation on the received data in the second format. The received data in the second data format is stored on an on-chip memory (OCM) of the ML hardware.

Inventors

Ulf HANEBUTTE
Senad DURAKOVIC
Harri Hakkarainen
Derek Chickles
Geethanjali Rajegowda
Saurabh Shrivastava

Assignees

MARVELL ASIA PTE LTD

Dates

Publication Date: 20260505
Application Date: 20231013

Claims (19)

1 . A system, comprising: a memory component configured to store data in a first data format, wherein the memory component is external to a machine learning (ML) hardware; a data streaming engine configured to stream data from the memory component external to the ML hardware to a memory component within the ML hardware, wherein the data streaming engine is configured to convert the data from the first data format to a second data format without storing the converted data in the second data format prior to transmitting the data in the second format to the ML hardware, wherein the second data format is associated with a data format of the ML hardware, and wherein the second data format is different from the first data format; and said ML hardware configured to receive the data in the second format and wherein the ML hardware is configured to perform at least one ML operation on the received data in the second format.
2 . The system of claim 1 , wherein the data streaming engine comprises a data format conversion block configured to convert the data from the first data format to the second data format.
3 . The system of claim 1 , wherein the memory component external to the ML hardware is configured to store the data in the first data format as being received by a network interface card (NIC).
4 . The system of claim 3 , wherein the data is stored in the memory component external to the ML hardware via a direct memory access (DMA) engine.
5 . The system of claim 1 , wherein the memory component external to the ML hardware is a double data rate (DDR) memory.
6 . The system of claim 1 , wherein the first data format and the second data format is one of floating point (FP) 32, FP16, integer (INT) 8, unsigned int (UINT) 8, FP8, Brain FP (BF) 16, Fixed Point (FXP), In-phase Quadrature FP (IQFP), Quadrature (Q) format, quantization/dequantization, and scaling.
7 . The system of claim 1 , wherein said ML hardware is configured to transmit processed data in a third data format to the data streaming engine, and wherein the data streaming engine is configured to convert the processed data from the third data format to a fourth data format.
8 . The system of claim 7 , wherein the third data format is a same as the second data format, and wherein the first data format is a same as the fourth data format.
9 . A method comprising: receiving data in a first data format; converting the received data from the first data format to a second data format, wherein a machine learning (ML) hardware is configured to perform one or more operations based on the second data format, wherein the second data format is different from the first data format, and wherein the converting is performed by a hardware component; transmitting the converted data in the second format to the ML hardware; performing at least one ML operation associated with the data in the second format using the ML hardware; and transmitting processed data from the ML hardware in a third data format to the hardware component, and wherein the hardware component is configured to convert the processed data from the third data format to a fourth data format, wherein the third data format is a same as the second data format, and wherein the first data format is a same as the fourth data format.
10 . The method of claim 9 , wherein the data is converted from the first data format to the second data format using a direct memory access (DMA) engine.
11 . The method of claim 10 , wherein the data is converted from the first data format to the second data format using the DMA engine as data is being received, and wherein the data in the second format is stored by the DMA engine in a memory component external to the ML hardware.
12 . The method of claim 9 , wherein the data is converted from the first data format to the second data format using the DMA engine as data is being fetched from a memory component that stored the received data in the first format for transmission to the ML hardware.
13 . The method of claim 12 , wherein the memory component is a double data rate (DDR) memory.
14 . The method of claim 9 , wherein the data is converted from the first data format to the second data format using a data streaming engine, wherein the data streaming engine is configured to stream data from a memory component external to the ML hardware that stores the received data in the first data format and converts the data from the first data format to the second data format without storing the converted data in the second data format prior to transmitting the data in the second format to the ML hardware.
15 . The method of claim 9 , wherein the first data format and the second data format is one of floating point (FP) 32, FP16, integer (INT) 8, unsigned int (UINT) 8, FP8, Brain FP (BF) 16, Fixed Point (FXP), In-phase Quadrature FP (IQFP), Quadrature (Q) format, quantization/dequantization, and scaling.
16 . The method of claim 9 further comprising storing the converted data in the second format within a memory component of the ML hardware.
17 . A system comprising: a means for receiving data in a first data format; a means for converting the received data from the first data format to a second data format, wherein a machine learning (ML) hardware is configured to perform one or more operations based on the second data format, wherein the second data format is different from the first data format, and wherein the converting is performed by a hardware component; a means for transmitting the converted data in the second format to the ML hardware; a means for performing at least one ML operation associated with the data in the second format using the ML hardware; a means for transmitting processed data from the ML hardware in a third data format to the hardware component, and wherein the hardware component is configured to convert the processed data from the third data format to a fourth data format, wherein the third data format is a same as the second data format, and wherein the first data format is a same as the fourth data format.
18 . The system of claim 17 , wherein the first data format and the second data format is one of floating point (FP) 32, FP16, integer (INT) 8, unsigned int (UINT) 8, FP8, Brain FP (BF) 16, Fixed Point (FXP), In-phase Quadrature FP (IQFP), Quadrature (Q) format, quantization/dequantization, and scaling.
19 . The system of claim 17 further comprising a means for storing the converted data in the second format within a memory component of the ML hardware.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit and priority to the U.S. Provisional Patent Application No. 63/537,429 filed on Sep. 8, 2023, which is incorporated herein in its entirety. This application is a continuation-in-part application and claims the benefit and priority to the U.S. Nonprovisional application Ser. No. 17/248,045, that was filed Jan. 6, 2021, which is incorporated herein by reference in its entirety. The U.S. patent application Ser. No. 17/248,045 is a continuation application and claims the benefit and priority to the U.S. Nonprovisional application Ser. No. 16/226,508, that was filed Dec. 19, 2018, which is incorporated herein by reference in its entirety. The U.S. Nonprovisional application Ser. No. 16/226,508 claims the benefit of U.S. Provisional Patent Application No. 62/628,130, filed Feb. 8, 2018, and entitled “MACHINE LEARNING SYSTEM,” which is incorporated herein in its entirety by reference. The U.S. Nonprovisional application Ser. No. 16/226,508 claims the benefit of U.S. Provisional Patent Application No. 62/644,352, filed Mar. 16, 2018, and entitled “PROGRAMMING HARDWARE ARCHITECTURE FOR MACHINE LEARNING VIA INSTRUCTION STREAMING,” which is incorporated herein in its entirety by reference. The U.S. Nonprovisional application Ser. No. 16/226,508 claims the benefit of U.S. Provisional Patent Application No. 62/675,076, filed May 22, 2018, which is incorporated herein in its entirety by reference. BACKGROUND Use and implementations of machine learning (ML) and artificial intelligence (AI) methods on electronic devices has become ubiquitous. A hardware component of the electronic devices, whether a processor, a programmable logic, a dedicated hardware such as application specific integrated circuit (ASIC), or a dedicated ML hardware, often receives data in a different data format than the application that generates the data. For example, data may be generated by an application in floating point 32 whereas the hardware architecture designed for performing ML operations may require or expect the data in a different data format (i.e., precision) such as floating point 16. Converting the data format from one data format to another data format is typically performed by a software component, e.g., a driver. Data conversion using software typically requires for the data to be read from a memory component that stores the data in its original data format, e.g., floating point 32, and subsequently for the data to be converted into the required data format, e.g., floating point 16, to form the newly converted data that is then stored in memory before the converted data is sent to the ML hardware for processing. Reading from a memory component first, converting the data into a different data format, and storing the newly converted data format for the data in a memory component before sending it to the ML hardware for processing can be inefficient and resource intensive since such process requires an additional write into a memory component. The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent upon a reading of the specification and a study of the drawings. BRIEF DESCRIPTION OF THE DRAWINGS Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion. FIG. 1A-1B depict examples of diagrams of a hardware-based programmable architecture configured to support machine learning according to one aspect of the present embodiments. FIG. 2 depicts an example of memory layout for streaming load instruction for a data stream according to one aspect of the present embodiments. FIG. 3A-3B depict an example of a diagram of the system with instruction/data-streaming engines according to one aspect of the present embodiments. FIG. 4 depicts a diagram of an example of the architecture of the inference engine according to one aspect of the present embodiments. FIG. 5A-5B depict a diagram of another example of the architecture of the inference engine according to one aspect of the present embodiments. FIG. 6 depicts a diagram of an example of the architecture of the first type of processing unit according to one aspect of the present embodiments. FIG. 7 depicts a diagram of an example of the architecture of the second type of processing unit according to one aspect of the present embodiments. FIG. 8 depicts an illustrative flow diagram for converting data from one data format to another data format according to one aspect of the present embodiments. DETAILED DESCRIPTION The following disclosure provides many different embodiments, or examples, for im