US-12625788-B1 - Method for testing artificial intelligence units located across separate processing chips
Abstract
A method for testing artificial intelligence units (AIUs) located across separate processing chips is disclosed. The method provides the right test environment and test cases that stress remote AIU usages. The test cases include multiple machine instruction streams built to be executed on AIUs located on different processing chips. These test instruction streams include AIU primitives that may be executed on local or remote AIUs. The insertion of AIU primitives in each instruction stream is based on a random selection. This randomness allows concentration of AIU primitives in instruction streams built for processor cores of one processing chip while instruction streams running on another processing chip may lack AIU primitives, and as a result, the AIU on that processing chip may become idle. The test cases also cause different AIUs on different processing chips to compete for resources such as input and output data buffers.
Inventors
- Ali Y. Duale
- Patrick Duffy
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20241203
Claims (20)
- 1 . A computer-implemented method comprising: associating a testing system to a first and second processing chips, wherein said first processing chip includes a first artificial intelligence unit (AIU) and a first set of processors, and said second processing chip includes a second AIU and a second set of processors; combining and re-allocating memories of said first and second processing chips as a set of private buffers, a shared memory space for inputs, and a shared memory space for outputs; generating an instruction stream for said first and second processing chips with each instruction stream containing one or more AIU primitives; executing one of said AIU primitives within said instruction streams in one of said first and second AIUs; writing results of said executions of said one AIU primitive into a first location of said shared memory space for outputs; copying contents stored in said first location of said shared memory space for outputs to a second location of one of said private buffer reserved for said AIU that has executed said one AIU primitive; reading contents in said first location of said shared memory space for outputs at a random time; determining whether or not said read contents in said first location of said shared memory space for outputs matches said contents stored in said second location of said one private buffer; and in response to said read contents in said first location of said shared memory space for outputs not matching said contents stored in said second location of said one private buffer, flagging an error.
- 2 . The method of claim 1 , further comprising in response to said read contents in said first location of said shared memory space for outputs matching said contents stored in said second location of said one private buffer, returning to said executing step.
- 3 . The method of claim 1 , wherein said second processing chip has less AIU primitives than said first processing chip in order to force said second AIU within said second processing chip to become idle at times.
- 4 . The method of claim 1 , wherein said first and second AIUs are set to read from said shared memory space for inputs and to write to said shared memory space for outputs.
- 5 . The method of claim 4 , wherein said first and second AIUs execute same function code with same or different dimensions.
- 6 . The method of claim 1 , wherein said first and second AIUs are set to read from said shared memory space for inputs, and said first AIU is set to write to one of said private buffers while said second AIU is set to write to said shared memory space for outputs.
- 7 . The method of claim 6 , wherein information in said one private buffer is compared to information in said shared memory space for outputs.
- 8 . The method of claim 6 , wherein said first and second AIUs execute same function code with same or different dimensions.
- 9 . The method of claim 1 , wherein said first AIU is to write to a first one of said private buffers and said second AIU is set to write to a second one of said private buffers.
- 10 . The method of claim 8 , wherein an AIU primitive output from said first AIU is larger than an AIU primitive output from said second AIU.
- 11 . A computer program product for testing artificial intelligence units located across separate processing chips, said computer program product comprising a computer readable storage medium having program instructions embodied therein, said program instructions executable by a computer to cause said computer to perform: associating a testing system to a first and second processing chips, wherein said first processing chip includes a first artificial intelligence unit (AIU) and a first set of processors, and said second processing chip includes a second AIU and a second set of processors; combining and re-allocating memories of said first and second processing chips as a set of private buffers, a shared memory space for inputs, and a shared memory space for outputs; generating an instruction stream for said first and second processing chips with each instruction stream containing one or more AIU primitives; executing one of said AIU primitives within said instruction streams in one of said first and second AIUs; writing results of said executions of said one AIU primitive into a first location of said shared memory space for outputs; copying contents stored in said first location of said shared memory space for outputs to a second location of one of said private buffer reserved for said AIU that has executed said one AIU primitive; reading contents in said first location of said shared memory space for outputs at a random time; determining whether or not said read contents in said first location of said shared memory space for outputs matches said contents stored in said second location of said one private buffer; and in response to said read contents in said first location of said shared memory space for outputs not matching said contents stored in said second location of said one private buffer, flagging an error.
- 12 . The computer program product of claim 11 , further comprising in response to said read contents in said first location of said shared memory space for outputs matching said contents stored in said second location of said one private buffer, returning to said executing step.
- 13 . The computer program product of claim 11 , wherein said second processing chip has less AIU primitives than said first processing chip in order to force said second AIU within said second processing chip to become idle at times.
- 14 . The computer program product of claim 11 , wherein said first and second AIUs are set to read from said shared memory space for inputs and write to said shared memory space for outputs.
- 15 . The computer program product of claim 14 , wherein said first and second AIUs execute same function code with same or different dimensions.
- 16 . The computer program product of claim 11 , wherein said first and second AIUs are set to read from said shared memory space for inputs, and said first AIU is set to write to one of said private buffers while said second AIU is set to write to said shared memory space for outputs.
- 17 . The computer program product of claim 16 , wherein information in said one private buffer is compared to information in said shared memory space for outputs.
- 18 . The computer program product of claim 16 , wherein said first and second AIUs execute same function code with same or different dimensions.
- 19 . The computer program product of claim 11 , wherein said first AIU is to write to a first one of said private buffers and said second AIU is set to write to a second one of said private buffers.
- 20 . The computer program product of claim 11 , wherein an AIU primitive output from said first AIU is larger than an AIU primitive output from said second AIU.
Description
TECHNICAL FIELD The invention relates to system-level testing in general, and in particular, to a method and system for testing artificial intelligence units located across separate processing chips. BACKGROUND A modern data processing system may include multiple processing chips, with each processing chip having a set of processor cores coupled to an artificial intelligence unit (AIU). The AIU functions to handle requests from the processor cores. When there are multiple AIU primitives for an AIU within a processing chip to execute, only one single AIU primitive can access the AIU on that processing chip while the remaining AIU primitives have to wait for their turn. This kind of serial execution tends to reduce the overall system efficiency. As such, several methods have developed to take advantage of an idle AIU located on a separate (remote) processing chip within a data processing system. For example, some of the AIU primitives in queue of a first processing chip within a data processing system may be dispatched to an AIU located within a second processing chip within the same data processing system for execution. Due to the nature of AIUs, it is possible for a single AIU instance to be interrupted multiple times and later resume execution on a different AIU located on a different processing chip. From a testing and verification standpoint, it is relatively straightforward to verify the integrity of function codes employed by the above-mentioned serial execution method. However, when more than one AIU is involved with each AIU being on a separate processing chip, the verification of the function codes for handling the execution of multiple AIU primitives across multiple AIUs can become much more complicated. The detection of remote AIU execution, as well as monitoring the usage of these remote AIUs, adds to the challenges of the validation of design correctness. Thus, a correct test environment needs to be developed for testing remote AIU usages, and for validating the correctness of the AIU behavior. SUMMARY The present disclosure provides an improved method for testing the function codes that are utilized to perform the execution of multiple AIU primitives in multiple AIUs located across separate chips. In accordance with one embodiment of the present invention, a testing system is associated to a first processing chip and a second processing chip. The first processing chip includes a first AIU and a first set of processor cores. The second processing chip includes a second AIU and a second set of processor cores. The memories within the first and second processing chips are combined and re-allocated as a set of private buffers, a shared memory space for inputs, and a shared memory space for outputs. Multiple instruction streams are generated for the first and second processing chips, and each instruction stream may contain one or more AIU primitives. One of the AIU primitives within instruction streams is executed by one of the AIUs. The instruction streams as well as the data that the AIUs operate on are built such that the same AIU primitives are operated by different chips using same input data for each test case build. Afterwards, the results of the execution of the one AIU primitive are written into a first location of the shared memory space for outputs. The contents stored in the first location of the shared memory space for outputs are then copied to a second location of one of the private buffer reserved for the AIU that has executed the one AIU primitive. Subsequently, the contents in the first location of the shared memory space for outputs are read at a random time. A determination is made whether or not the contents read from the first location of the shared memory space for outputs matches the contents stored in the second location of the one private buffer. If the contents read from the first location of the shared memory space for outputs do not match the contents stored in the second location of the one private buffer, an error is flagged. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a test environment for testing artificial intelligence units located across separate processing chips, according to one embodiment of the present invention; FIG. 2 is a block diagram illustrating a scenario that needs to be tested and verified by a testing system within the test environment of FIG. 1; FIG. 3 is a flowchart of a method for testing the scenario shown in FIG. 2, according to one embodiment of the present invention; FIGS. 4A-4C depict various testing methodologies, according to one embodiment of the present invention; and FIG. 5 is a block diagram of a computing environment in which an embodiment of present invention can be executed. In accordance with common practice, various features illustrated in the drawings may not be drawn to scale. Accordingly, dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings m