Search

WO-2026096160-A1 - EFFICIENT TENSOR OPERATIONS USING SLICING

WO2026096160A1WO 2026096160 A1WO2026096160 A1WO 2026096160A1WO-2026096160-A1

Abstract

A system and method of performing tensor operations with a multi-step operation processing system in a memory-efficient manner. The method includes the stages of dividing an N-dimensional tensor into a set of tensor slices. The tensor slices consist of one or more consecutive rows. The tensor slices may further be segmented. The tensor slice segments, along with the dependency data, form based on the tensor dependencies are used for an tensor operation computation to generate a first result. Each processed slice segment is fused into a result slice by removing extra data used in the computation. This process is repeated for each slice to be processed and combined into a final processed tensor result.

Inventors

  • ALNAHARI, SUHAIL IBRAHIM
  • CHUANG, KAI-ER
  • MA, SIYAD CHIH-HUA
  • CHUANG, SHANG-TSE
  • CHOLE, SHARAD VASANTRAO

Assignees

  • Expedera, Inc.

Dates

Publication Date
20260507
Application Date
20251006
Priority Date
20241030

Claims (20)

  1. CLAIMS
  2. What is claimed is:
  3. 1. A method of performing one or more tensor operations with a multi-step operation processing system in a memory efficient manner, said method comprising the stages of:
  4. dividing a N-dimensional input tensor including X-rows and Y-columns into a set of tensor slices, the tensor slices consisting of one or more consecutive X-rows of said N-dimensional tensor;
  5. for each tensor slice in said set of tensor slices; performing the substages of: loading tensor slice data for a tensor operation,
  6. loading dependency data from one or more adjacent tensor slice data based on the tensor transform dependencies,
  7. executing said tensor operation on said tensor slice data, dependency data, and any saved intermediate results to generate tensor segment results,
  8. fusing the tensor segment results thereby removing extra data resulting from the dependency data and tensor operation and forming a tensor segment output, repeating said substages of loading, executing, and fusing until an output tensor slice from said multi-step operation processing system is complete; and repeating said substages of loading and executing until final tensor outputs from said multi-step operation processing system is complete for said set of tensor slices.
  9. 2. The method of claim 1, further comprising combining said final tensor outputs for each said set of tensor slices into a final tensor result for said tensor.
  10. 3. The method of claim 1, wherein said tensor operation is a convolution computation. 4. The method of claim 1 wherein said tensor operation is more than one layer operation.
  11. 5. The method of claim 1, wherein the method extracts one or more feature from the image.
  12. 6. The method of claim 1 wherein said tensor slices comprise multiple rows of tensor data.
  13. 7. A system, comprising:
  14. a processor; and
  15. a memory communicatively coupled to the processor, the memory for storing instructions executable by the processor to perform a method, said method comprising the stages of:
  16. dividing a N-dimensional input tensor including X-rows and Y-columns into a set of tensor slices, the tensor slices consisting of one or more consecutive X-rows of said N-dimensional tensor;
  17. for each tensor slice in said set of tensor slices; performing the substages of:
  18. loading tensor slice data for a tensor operation,
  19. loading dependency data from one or more adjacent tensor slice data based on the tensor operation dependencies,
  20. executing said tensor operation on said tensor slice data, dependency data, and any saved intermediate results to generate tensor segment results,

Description

EFFICIENT TENSOR OPERATIONS USING SLICING CROSS-REFERENCE TO RELATED APPLICATIONS [0001] The present application is a continuation-in-part and claims priority benefit of U. S. Patent Application No. 17/704,488 filed October 18, 2021. The U. S. Patent Application No. 17/704,488 is a continuation of U. S. Patent Application entitled “Method and Apparatus for Efficiently Processing Convolution Neural Network Operations” filed September 11, 2019 having U. S. patent application Ser. No. 16/568, 195. [0002] The present invention relates to the field of artificial intelligence, digital image analysis and tensor processing. In particular, but not by way of limitation, the present invention discloses methods and apparatus for quickly and efficiently performing convolutional neural network computations. BACKGROUND [0003] Artificial Intelligence is field of computer science that seeks to emulate the cognitive functions of a human mind. For example, artificial intelligence attempts to create computer systems that are capable of learning and problem solving. Many different techniques have been used to attempt to create useful artificial intelligence systems. Simple algorithms, heuristics, Bayesian networks, decision trees, support vector machines, and many other techniques have been used to obtain effective results in the field of artificial intelligence. However, at the present time one of the most popular techniques used in the field of artificial intelligence is the construction of artificial neural networks. [0004] Artificial neural networks were originally designed based up the biological networks of neuron cells that are present within animal brains. Like biological brains, artificial neural networks operate by processing numerous input data elements (an input vector) to generate some sort of output inference just as human brains experience sights, sounds, and other sensory input from the world around them to generate inferences about that experience world. But, just like a newly born human infant, a brand new artificial neural network cannot make useful inferences until that artificial neural network has received a good amount of training. [0005] Before an artificial neural network is useful in a particular application, that artificial neural network first must be trained. To train an artificial neural network, sets of training data are presented to the artificial neural network and the artificial neural networks processes the training data to generate an inference from the training data. The neural network generated inference is then compared with a desired answer to determine an error amount. Tit at error amount is then used to adjust an internal weight matrix within the artificial neural network in order to improve the inference performance of the artificial neural network. Tins technique of making attempted inferences, comparing the generated inference to a desired correct result, and then adjusting various parameters within the artificial neural network accordingly is known as supervised learning. By training artificial neural networks with supervised learning with large amounts of training data, artificial neural networks can eventually become accurate at generating classification inferences that are very useful in various applications. [0006] One increasingly popular application for artificial neural network learning is the task of image recognition and classification. With image recognition and classification, digital image data is presented to an artificial neural network system and the artificial neural network system is tasked with recognizing and classifying items within the presented digital image. [0007] An artificial intelligence system designed for an image recognition and classification task can be extremely memory and computationally intensive. For example, consider the task of analyzing a conventional high-resolution image made up of 1920 by 1080 pixels wherein each individual pixel is made up of three different pixel color information values (red, green, and blue). That high-resolution digital image has 1920*1080*3=6,220,800 different data values that must be processed by the artificial neural network system. Furthermore, each individual pixel of the digital image will generally be involved in several different computations thus raising the number of computations exponentially. For full motion video artificial intelligence applications such as driving an autonomous vehicle, many individual digital video frames need to be processed each second. For example, with a 30 video frames per second system, 30*6,220,800=186,624,000 individual pixel data values must be processed by multiple computational operations each second just to perform the initial image processing and feature extraction tasks required for image recognition and classification. [0008] In order to perform image recognition and classification, a convolutional neural network (CNN) may be used. A convolutional neural network oper