Search

US-12619713-B1 - Compiler support for increasing resilience to bit-flips

US12619713B1US 12619713 B1US12619713 B1US 12619713B1US-12619713-B1

Abstract

A determination is made that a program defines a plurality of variants for populating a data structure. Respective values for the variants are selected such that the number of bit-flips required to transform one variant to another exceeds a threshold. In response to a detection, during execution of the program, that a value stored in the data structure does not match any of the selected values, a remedial action associated with detection of a bit-flip is initiated.

Inventors

  • Nathan Yong Seng Chong
  • Dominic Phillip MULLIGAN
  • KARIMALLAH AHMED MOHAMMED RASLAN
  • Hanno Becker

Assignees

  • AMAZON TECHNOLOGIES, INC.

Dates

Publication Date
20260505
Application Date
20230330

Claims (20)

  1. 1 . A system, comprising: one or more computing devices; wherein the one or more computing devices include instructions that upon execution on or across the one or more computing devices: determine, by a compiler, that a program whose resiliency against bit-flips is to be enhanced comprises a data structure within which a value of a variant selected from a plurality of variants of an enumeration data type is going to be stored during execution of the program; select, by the compiler, respective values for individual variants of the plurality of variants, including a first value for a first variant and a second value for a second variant, such that a Hamming distance between the first value and the second value exceeds a threshold; cause, by the compiler, a compiled version of the program to be stored, wherein the compiled version comprises the respective values selected by the compiler for the individual variants of the plurality of variants; and in response to a detection, during an execution of the compiled version of the program, that a particular value stored in the data structure does not match any of the respective values selected by the compiler, initiate a remedial action associated with an occurrence of a bit-flip.
  2. 2 . The system as recited in claim 1 , wherein the one or more computing devices include further instructions that upon execution on or across the one or more computing devices: utilize, by the compiler, a directive included in source code of the program to determine that values for the plurality of variants are to be selected by the compiler to enhance resilience of the program to bit-flips.
  3. 3 . The system as recited in claim 1 , wherein the one or more computing devices include further instructions that upon execution on or across the one or more computing devices: insert, by the compiler into the compiled version of the program, executable code to detect whether a value stored in the data structure matches any of the respective values selected by the compiler.
  4. 4 . The system as recited in claim 1 , wherein the one or more computing devices include further instructions that upon execution on or across the one or more computing devices: obtain, by the compiler, an indication of an algorithm to be used to select the respective values.
  5. 5 . The system as recited in claim 1 , wherein the one or more computing devices include further instructions that upon execution on or across the one or more computing devices: obtain, by the compiler, an indication of the remedial action; and insert, by the compiler in the compiled version of the program, executable code to initiate the remedial action.
  6. 6 . A computer-implemented method, comprising: determining, by a compiler, that a particular program defines a first plurality of variants from which individual variants are used to populate a data structure; selecting, by the compiler, respective values for the individual variants of the first plurality of variants, including a first value for a first variant and a second value for a second variant, such that a number of bit-flips required to transform the first value to the second value exceeds a threshold; and in response to detecting, during an execution of a compiled version of the particular program generated by the compiler, that a particular value stored in the data structure does not match any of the respective values selected by the compiler for the first plurality of variants, initiating a remedial action associated with detection of a bit-flip.
  7. 7 . The computer-implemented method as recited in claim 6 , further comprising: obtaining, by the compiler via a configuration setting, an indication that resilience of one or more programs including the particular program to bit-flips is to be enhanced, wherein said selecting is responsive to obtaining the indication.
  8. 8 . The computer-implemented method as recited in claim 6 , further comprising: obtaining, by the compiler, a hint within source code of the particular program indicating that values for the first plurality of variants are to be selected by the compiler to enhance resilience of the particular program to bit-flips.
  9. 9 . The computer-implemented method as recited in claim 6 , further comprising: inserting, by the compiler into the compiled version of the particular program, executable code to detect whether a value stored in the data structure matches any of the respective values selected by the compiler for the first plurality of variants.
  10. 10 . The computer-implemented method as recited in claim 6 , further comprising: obtaining, by the compiler via a configuration setting, an indication of an algorithm to be used to select the respective values.
  11. 11 . The computer-implemented method as recited in claim 6 , further comprising: accessing, by the compiler, a pre-computed set of values, prepared prior to analysis of the particular program by the compiler, to be assigned to variants in one or more programs including the particular program, wherein individual ones of the respective values are selected from the pre-computed set.
  12. 12 . The computer-implemented method as recited in claim 6 , further comprising: obtaining, by the compiler, an indication of the remedial action; and inserting, by the compiler in the compiled version of the particular program, executable code to initiate the remedial action.
  13. 13 . The computer-implemented method as recited in claim 6 , wherein the remedial action comprises one or more of: (a) causing a warning or error message indicating a potential occurrence of a bit-flip to be delivered to one or more destinations, (b) pausing execution of the particular program, or (c) causing termination of execution of the particular program.
  14. 14 . The computer-implemented method as recited in claim 6 , further comprising: providing, via one or more programmatic interfaces, an indication that a value of a particular variant of a second plurality of variants defined in the particular program cannot be set by the compiler without overriding a programmer-selected value for the particular variant.
  15. 15 . The computer-implemented method as recited in claim 6 , wherein the data structure comprises one or more of: (a) an operating system process state data structure, (b) a page table used for translation of virtual address to physical addresses or (c) a data structure of an application run in user mode.
  16. 16 . One or more non-transitory computer-accessible storage media storing program instructions that when executed on or across one or more processors: determine, by a complier, that a particular program defines a first plurality of variants from which individual variants are used to populate a data structure; select, by the compiler, respective values for the individual variants of the first plurality of variants, including a first value for a first variant and a second value for a second variant, such that a number of bit-flips required to transform the first value to the second value exceeds a threshold; and in response to detecting, during an execution of a version of the particular program, that a particular value stored in the data structure does not match any of the respective values selected for the first plurality of variants, initiate a remedial action associated with detection of a bit-flip.
  17. 17 . The one or more non-transitory computer-accessible storage media as recited in claim 16 , storing further program instructions that when executed on or across the one or more processors: obtain, via a configuration setting, an indication that resilience of one or more programs including the particular program to bit-flips is to be enhanced.
  18. 18 . The one or more non-transitory computer-accessible storage media as recited in claim 16 , storing further program instructions that when executed on or across the one or more processors: obtain a hint within source code of the particular program indicating that values for the first plurality of variants are to be selected to enhance resilience of the particular program to bit-flips.
  19. 19 . The one or more non-transitory computer-accessible storage media as recited in claim 16 , storing further program instructions that when executed on or across the one or more processors: insert, into the version of the particular program, executable code to detect whether a value stored in the data structure matches any of the respective values selected for the first plurality of variants.
  20. 20 . The one or more non-transitory computer-accessible storage media as recited in claim 16 , storing further program instructions that when executed on or across the one or more processors: obtain, via a configuration setting, an indication of an algorithm to be used to select the respective values.

Description

BACKGROUND Undesired modifications to data stored in the memory of computer systems can cause significant security issues (by changing the control flow of programs in an undetected manner) and application disruptions such as program crashes or hangs. Physical effects, such as alpha particles striking a memory cell, can cause such memory errors, referred to as bit-flips. Bit-flips can also be induced by certain types of attacks. Commonly-used memory protection techniques such as error correcting codes (ECC) can typically only correct single bit-flips and detect double bit-flips, but cannot detect more than two bit-flips. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 illustrates an example system environment in which several types of techniques may be employed either independently or in combination at computing devices to enhance resilience of programs against bit-flips, according to at least some embodiments. FIG. 2 illustrates the relationship between Hamming distance and the vulnerability of programs to memory errors involving the transformation of one legitimate value in memory to another legitimate value, according to at least some embodiments. FIG. 3 illustrates an example use of enumerated (enum) data types to store important state information at a computing device, according to at least some embodiments. FIG. 4 illustrates example operations which may be performed by a compiler to enhance the resilience of programs using enum data types to bit-flips, according to at least some embodiments. FIG. 5 illustrates example types of program locations at which a compiler configured to enhance resilience against bit-flips may insert code during compilation, according to at least some embodiments. FIG. 6 illustrates example programmatic interactions associated with the use of compiler-based techniques for bit-flip resilience enhancement, according to at least some embodiments. FIG. 7 is a flow diagram illustrating aspects of operations that may be performed to implement compile-time techniques to enhance bit-flip resilience, according to at least some embodiments. FIG. 8 illustrates example memory management components and data structures of a computing device which supports virtual memory, according to at least some embodiments. FIG. 9 illustrates an example multi-level page table which may be traversed frequently by a hardware memory management unit (MMU) of a computer system to perform virtual-to-physical address translation, according to at least some embodiments. FIG. 10 illustrates example contents of a page table entry, according to at least some embodiments. FIG. 11 illustrates example operations which may be performed by an operating system's virtual memory management program to enhance resilience of a page table to bit-flips in collaboration with an MMU, according to at least some embodiments. FIG. 12 illustrates example operations which may be performed by an MMU, working in collaboration with an operating system's virtual memory management program, to enhance resilience of a page table to bit-flips, according to at least some embodiments. FIG. 13 illustrates example programmatic interactions associated with the use of collaborative hardware-software techniques for bit-flip resilience enhancement, according to at least some embodiments. FIG. 14 is a flow diagram illustrating aspects of operations that may be performed to implement collaborative hardware-software techniques to enhance bit-flip resilience, according to at least some embodiments. FIG. 15 illustrates an example provider network environment, according to at least some embodiments. FIG. 16 is a block diagram illustrating an example computing device that may be used in at least some embodiments. While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to. When used in the claims, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof. Unless otherwise explicitly stated, articles such as “a” or “an” should generally be interpreted to include one o