EP-4742588-A2 - STREAMING DATA TO MULTI-TILE PROCESSING SYSTEM
Abstract
A processing system comprising one or more chips, each comprising a plurality of tiles is described. Each tile comprises a respective processing unit and memory, the memory storing a codelet. The processing system has at least one encryption unit configured to encrypt and decrypt data transferred between the tiles and a trusted computing entity via an external computing device. The codelets are configured to instruct the tiles to transfer the encrypted data by reading from and writing to a plurality of memory regions at the external memory such that a plurality of streams of encrypted data are formed, each stream using an individual one of the memory regions at the external computing device.
Inventors
- WILKINSON, DANIEL JOHN PELHAM
- OSBORNE, RICHARD
- CUNNINGHAM, GRAHAM BERNARD
- GORDON, KENNETH
- WEBSTER, Samuel Alexander
- VOLOS, Stavros
- VASWANI, KAPIL
- VEMBU, BALAJI
- FOURNET, CEDRIC ALAIN MARIE
Assignees
- Microsoft Technology Licensing, LLC
Dates
- Publication Date
- 20260513
- Application Date
- 20210713
Claims (15)
- A processing system (112) comprising: one or more chips (122), each comprising a plurality of tiles (114, 506), each tile (506) comprising a respective processing unit (512) and memory (508) storing a codelet (510); at least one encryption unit (502) configured to encrypt and decrypt data transferred between the tiles and a trusted computing entity via an external memory (108); wherein the codelets have been compiled by a compiler at the trusted computing entity, to instruct the tiles to transfer the encrypted data by reading from and writing to a plurality of memory regions at the external memory such that a plurality of streams of encrypted data are formed, each stream using an individual one of the memory regions at the external memory, wherein at least one of the streams is a secure checkpoint stream for reading or writing encrypted model weights and metadata.
- The processing system of claim 1 wherein the secure checkpoint stream is a secure checkpoint egress stream for writing model weights and metadata from the processing system to the external memory at a stage of training of the model referred to as a checkpoint, to enable training of a model to be restarted from the checkpoint, the metadata comprising an offset from which the data streams will be resumed; and wherein at least one of the tiles is an egress tile of the secure checkpoint egress stream, the egress tile being configured to obtain an initial value of a checkpoint epoch counter and an initial value of a checkpoint identifier.
- The processing system of claim 2 wherein the egress tile is configured to divide the model weights and metadata into frames and to generate and add an initialization vector to each frame, the initialization vectors being generated using the codelet and current values of the checkpoint epoch counter and checkpoint identifier, and wherein the egress tile is configured to increment the checkpoint identifier after writing the weights and metadata.
- The processing system of claim 2 or claim 3, wherein the checkpoint comprises metadata comprising plaintext and ciphertext, wherein the plaintext is consumed by a host runtime and the ciphertext is decrypted during loading of the checkpoint and used by the tiles that will fetch confidential data streams.
- The processing system of claim 3 or claim 4, wherein the codelet is generated by the compiler and deployed at the egress tile to read the secure checkpoint stream and the codelet generates a sequence of expected initialization vectors, checks that initialization vectors returned in the frames match an expected initialization vector and strips the initialization vector and authentication tag from the frames.
- The processing system of claims 2 to 5, wherein the egress tile is configured to: determine, using information about data to be written to the external memory, a size and initialization vector of a next frame of one of the streams being written from the processing system to the external memory; write the initialization vector into a current frame of the stream; and issue a write request for the current frame, the write request being issued to the external memory region associated with the stream.
- The processing system of any one of preceding claims wherein secure checkpoint stream is a secure checkpoint ingress stream for reading model weights and metadata from the external memory into the processing system, the metadata comprising a current offset of an ingress stream; and wherein at least one of the tiles is an ingress tile of the secure checkpoint ingress stream, the ingress tile being configured to: obtain an initial value of a checkpoint epoch counter and an initial value of a checkpoint identifier and to use the initial values of the checkpoint epoch counter and the checkpoint identifier to generate expected initialization vectors while reading the model weights and metadata, and to increment the checkpoint epoch counter and reset the checkpoint identifier after reading the model weights and metadata.
- The processing system of any one of preceding claims, wherein the codelet is generated by the compiler to write the secure checkpoint stream and the codelet generates a sequence of expected initialization vectors and places each of them in a header of a frame.
- The processing system of claim 7 or claim 8 wherein the ingress tile is configured to: determine, using the codelet of the tile, an expected initialization vector of a next frame of one of the streams to be read; issue a read request to read a next frame of the stream from the memory region associated with the stream; responsive to the next frame arriving in local memory of the ingress tile, check that an initialization vector contained in the next frame matches the expected initialization vector; and responsive to the match failing, generating a security exception.
- The processing system of any one of preceding claims, wherein a secure microcontroller unit 'SMCU' is configured to provision two checkpointing keys, one for encrypting model weights and metadata to be written for a new training epoch and one for decrypting model weights and metadata to be read from a previous epoch, and wherein the secure checkpoint stream has an associated plaintext checkpoint stream comprising metadata in plaintext form.
- The processing system of any of claims 3 to 10, wherein the initialization vector for each frame is constructed according to a format having a plurality of fields, wherein the fields comprise a stream type field which is used to indicate that the stream is for the checkpoint, the checkpoint epoch counter field which is incremented when the machine learning process resumes at the multi-tile processor, the checkpoint identifier field which starts at 1 for the first checkpoint and increments by one for every subsequent checkpoint, a processing unit identifier field which has a local identifier of the processing unit, a tile identifier field which has an identifier of a tile to which the frame is to be deployed, an index field which has an index of the frame within the stream.
- The processing system of any one of preceding claims, wherein the secure checkpoint streams is a flexible layout stream which transfers the data by breaking the data up into frames in an order that can change and/or is dependent on an application which will use the transferred data.
- The processing system of any one of preceding claims, wherein each of the codelets has been compiled by the compiler according to a plurality of parameters determined by the compiler within specified constraints, wherein the parameters are selected from one or more of: a contiguous region of specified size in the external memory, which keys to load into encryption units at a plurality of specified points of execution where execution is temporarily halted until keys are loaded, a set of the tiles that will issue read or write requests to the external memory, for each tile: an index indicating a starting point, and a number of frames to read or write subsequent to the starting point.
- A method performed at a processing system (112) comprising one or more chips (122), each comprising a plurality of tiles (114, 506), each tile comprising a respective processing unit (512) and memory (508), the method comprising: storing a codelet (510) at each tile (506), each codelet (510) having been compiled by a compiler at the trusted computing entity; using at least one encryption unit to encrypt and decrypt data transferred between the tiles and a trusted computing entity via an external memory (108); using the codelets to instruct the tiles to transfer the encrypted data by reading from and writing to a plurality of memory regions at the external memory such that a plurality of streams of encrypted data are formed, each stream using an individual one of the memory regions at the external memory, wherein at least one of the streams is a secure checkpoint stream for reading or writing encrypted model weights and metadata.
- A data center comprising: a plurality of compute nodes, each compute node comprising at least one peripheral device, the peripheral device comprises the processing system of any one of claims 1 to 13.
Description
BACKGROUND Multi-tile processing systems are increasingly used to facilitate parallel computing for applications such as machine learning where vast amounts of data is to be processed. Multi-tile processing systems are deployed in data centres and elsewhere to improve efficiency of various types of algorithm by allowing greater concurrency. Increasingly there is a desire to work with sensitive code and or sensitive data and to retain security and privacy. Often large amounts of sensitive code and or data are to be processed using resource intensive algorithms and multi-tile processing systems are an option to improve efficiency in such situations. However, where multi-tile processing systems are used additional challenges are introduced regarding security and privacy of sensitive code and/or data since it is difficult to transfer data to and from the multi-tile processing system securely. The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known multi-tile processing systems. SUMMARY The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not intended to identify key features or essential features of the claimed subject matter nor is it intended to be used to limit the scope of the claimed subject matter. Its sole purpose is to present a selection of concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later. In various examples there is a processing system comprising one or more chips, each comprising a plurality of tiles. Each tile comprises a respective processing unit and memory, the memory storing a codelet. The processing system has at least one encryption unit configured to encrypt and decrypt data transferred between the tiles and a trusted computing entity via an external memory. The codelets have been compiled by a compiler at the trusted computing entity to instruct the tiles to transfer the encrypted data by reading from and writing to a plurality of memory regions at the external memory such that a plurality of streams of encrypted data are formed, each stream using an individual one of the memory regions at the external memory. Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings. DESCRIPTION OF THE DRAWINGS The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein: FIG. 1 is a schematic diagram of a first trusted computing entity, an untrusted intermediary, and a multi-tile processing system;FIG. 2 is a schematic diagram of a compiler for generating codelets to deploy on a multi-tile processing system;FIG. 3 is a schematic diagram of a data center comprising multi-tile processing systems;FIG. 4 is a schematic diagram of a multi-tile processing unit of a multi tile processing system;FIG. 5 is a schematic diagram of a multi-tile processing system used with an external memory;FIG. 5A is a diagram illustrating the movement of different data components to and from tiles and through the encryption hardware;FIG. 5B is a schematic diagram illustrating the communication between processors of the accelerator subsystem and a host system;FIG. 6 is a flow diagram of process implemented at an egress tile in order to write a checkpoint to the external memory in a secure manner using a stream;FIG. 7 is a process implemented at an ingress tile in order to read a checkpoint in a secure manner using a steam;FIG. 8 is a schematic diagram of a multi-tile processing system used with an external memory and where there is a stream for transferring shuffled training data instances. and a permutation stream;FIG. 9 is a flow diagram of a method performed by a multi-tile processing system to support the use of streams for shuffled training data instances.FIG. 10 illustrates a mechanism for sending data packets from tiles to destinations external to a processing unit;FIG. 11 illustrates an example of a processor tile;FIG. 12 illustrates an example of the use of an initialization vector;FIG. 13 illustrates an integrated circuit;FIG. 14 illustrates components of an initialization vector;FIG. 15 illustrates how a tile may write data to host memory;FIG. 16 illustrates an example of movement of data when data is written to host memory;FIG. 17 illustrates an example of tiles writing to and reading from a memory that is part of an integrated circuit;FIG. 18 illustrates an example of communication between two integrated circuits;FIG. 19 illustrates an example of movement of data within an integrated circuit;FIG. 20 illustrates how multiple read or write requests issued by different tiles may be outstanding at any one time;FIG. 21 illustrates an example of an encryption unit. Like reference numerals