Search

US-20260126974-A1 - APPARATUS, SYSTEM, AND METHOD OF COMPILING CODE FOR A PROCESSOR

US20260126974A1US 20260126974 A1US20260126974 A1US 20260126974A1US-20260126974-A1

Abstract

For example, a compiler may be configured to identify a first data operation and a second data operation, which are executable in parallel according to a SIMD instruction to be executed by a single ALU of a target processor: to determine a selected compilation scheme from a first compilation scheme and a second compilation scheme based on a predefined selection criterion, wherein the first compilation scheme includes compilation of the first data operation and the second data operation into the SIMD instruction, wherein the second compilation scheme includes compilation of the first data operation into a first ALU instruction, and compilation of the second data operation into a second ALU instruction to be executed separately from the first ALU instruction; and to generate target code based on compilation of the first data operation and the second data operation according to the selected compilation scheme.

Inventors

  • Michael Marjieh
  • Alon Kom
  • Oren Benita Ben-Simhon

Assignees

  • MOBILEYE VISION TECHNOLOGIES LTD.

Dates

Publication Date
20260507
Application Date
20231012

Claims (20)

  1. 1 .- 28 . (canceled)
  2. 29 . A product comprising one or more tangible computer-readable non-transitory storage media comprising computer-executable instructions operable to, when executed by at least one processor, enable the at least one processor to cause a computing device to: identify, based on a source code, a first data operation and a second data operation, which are executable in parallel according to a Single Instruction/Multiple Data (SIMD) instruction to be executed by a single Arithmetic Logic Unit (ALU) of a target processor; determine a selected compilation scheme from a first compilation scheme and a second compilation scheme based on a predefined selection criterion, wherein the first compilation scheme comprises compilation of the first data operation and the second data operation into the SIMD instruction, wherein the second compilation scheme comprises compilation of the first data operation into a first ALU instruction, and compilation of the second data operation into a second ALU instruction to be executed separately from the first ALU instruction; and generate target code configured for execution by the target processor, the target code is based on compilation of the first data operation and the second data operation according to the selected compilation scheme.
  3. 30 . The product of claim 29 , wherein the predefined selection criterion comprises a register-utilization criterion corresponding to a register utilization of one or more registers of the target processor.
  4. 31 . The product of claim 30 , wherein the register-utilization criterion is based on the register utilization of the one or more registers of the target processor according to the first compilation scheme.
  5. 32 . The product of claim 30 , wherein the register-utilization criterion is configured such that the selected compilation scheme is to include the second compilation scheme when a first register utilization of the one or more registers of the target processor according to the first compilation scheme is greater than a second register utilization of the one or more registers of the target processor according to the second compilation scheme.
  6. 33 . The product of claim 29 , wherein the predefined selection criterion is based on a live range of at least one variable corresponding to at least one of the first data operation or the second data operation.
  7. 34 . The product of claim 33 , wherein the at least one variable comprises at least one of an input variable of the first data operation or the second data operation, or an output variable resulting from the first data operation or the second data operation.
  8. 35 . The product of claim 33 , wherein the predefined selection criterion is based on a live range of an input variable of the SIMD instruction, or an output variable resulting from the SIMD instruction.
  9. 36 . The product of claim 33 , wherein the predefined selection criterion is configured such that the selected compilation scheme is to include the second compilation scheme when a first live range of the variable according to the first compilation scheme is greater than a second live range of the variable according to the second compilation scheme.
  10. 37 . The product of claim 29 , wherein the predefined selection criterion is based on a count of cycles between a first cycle in which a result of the SIMD instruction is to be available in a register of the target processor, and a second cycle subsequent to the first cycle, in which the result of the SIMD instruction is to be retrieved from the register of the target processor.
  11. 38 . The product of claim 29 , wherein the predefined selection criterion is based on a count of cycles between a first cycle in which an input variable of the SIMD instruction is to be available in a register of the target processor, and a second cycle subsequent to the first cycle, in which the input variable of the SIMD instruction is to be retrieved from the register of the target processor.
  12. 39 . The product of claim 29 , wherein the predefined selection criterion is based on a difference between a live range of a first variable and a live range of a second variable, the first variable resulting from execution of the first data operation by the SIMD instruction, the second variable resulting from execution of the second data operation by the SIMD instruction.
  13. 40 . The product of claim 29 , wherein the predefined selection criterion is based on a difference between a live range of a first variable and a live range of a second variable, the first variable comprising an input for execution of the first data operation by the SIMD instruction, the second variable comprising an input for execution of the second data operation by the SIMD instruction.
  14. 41 . The product of claim 29 , wherein the predefined selection criterion is based on a latency of the ALU of the target processor to execute the SIMD instruction.
  15. 42 . The product of claim 29 , wherein the first ALU instruction is configured to be executed in a first execution cycle, and the second ALU instruction is configured to be executed in a second execution cycle different from the first execution cycle.
  16. 43 . The product of claim 29 , wherein the second compilation scheme comprises compilation of the first data operation into the first ALU instruction to be executed during a first execution cycle, and compilation of the second data operation into the second ALU instruction to be executed during a second cycle, which is different from the first cycle.
  17. 44 . The product of claim 29 , wherein the first data operation and the second data operation comprise data operations of a same instruction.
  18. 45 . The product of claim 29 , wherein the first data operation comprises a data operation of a first instruction, and the second data operation comprises a data operation of a second instruction separate from the first instruction.
  19. 46 . The product of claim 29 , wherein the source code comprises Open Computing Language (OpenCL) code.
  20. 47 . The product of claim 29 , wherein the computer-executable instructions, when executed, cause the computing device to compile the source code into the target code according to a Low Level Virtual Machine (LLVM) based (LLVM-based) compilation scheme.

Description

CROSS REFERENCE This application claims the benefit of and priority from U.S. Provisional Patent Application No. 63/415,305 entitled “APPARATUS, SYSTEM, AND METHOD OF COMPILING CODE FOR A PROCESSOR”, filed Oct. 12, 2022, the entire disclosure of which is incorporated herein by reference. BACKGROUND A compiler may be configured to compile source code into target code configured for execution by a processor. There is a need to provide a technical solution to support efficient processing functionalities. BRIEF DESCRIPTION OF THE DRAWINGS For simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity of presentation. Furthermore, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. The figures are listed below. FIG. 1 is a schematic block diagram illustration of a system, in accordance with some demonstrative aspects. FIG. 2 is a schematic illustration of a compiler, in accordance with some demonstrative aspects. FIG. 3 is a schematic illustration of a vector processor, in accordance with some demonstrative aspects. FIG. 4 is a schematic flow-chart illustration of a method of compiling code for a processor, in accordance with some demonstrative aspects. FIG. 5 is a schematic flow-chart illustration of a method of compiling code for a processor, in accordance with some demonstrative aspects. FIG. 6 is a schematic illustration of a product, in accordance with some demonstrative aspects. DETAILED DESCRIPTION In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of some aspects. However, it will be understood by persons of ordinary skill in the art that some aspects may be practiced without these specific details. In other instances, well-known methods, procedures, components, units and/or circuits have not been described in detail so as not to obscure the discussion. Some portions of the following detailed description are presented in terms of algorithms and symbolic representations of operations on data bits or binary digital signals within a computer memory. These algorithmic descriptions and representations may be the techniques used by those skilled in the data processing arts to convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities capture the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Discussions herein utilizing terms such as, for example, “processing”. “computing”, “calculating”, “determining”, “establishing”, “analyzing”, “checking”, or the like, may refer to operation(s) and/or process(es) of a computer, a computing platform, a computing system, or other electronic computing device, that manipulate and/or transform data represented as physical (e.g., electronic) quantities within the computer's registers and/or memories into other data similarly represented as physical quantities within the computer's registers and/or memories or other information storage medium that may store instructions to perform operations and/or processes. The terms “plurality” and “a plurality”, as used herein, include, for example, “multiple” or “two or more”. For example, “a plurality of items” includes two or more items. References to “one aspect”, “an aspect”, “demonstrative aspect”, “various aspects” etc., indicate that the aspect(s) so described may include a particular feature, structure, or characteristic, but not every aspect necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase “in one aspect” does not necessarily refer to the same aspect, although it may. As used herein, unless otherwise specified the use of the ordinal adjectives “first”, “second”, “third” etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner. Some aspects, for example, may capture the form of an entirely hardware aspect, an entirely software aspect, or an aspect including both hardware and software elements. Some a