Search

CN-122018908-A - Binary-rewriting-based cross-ISA (industry standard architecture) extension program translation method

CN122018908ACN 122018908 ACN122018908 ACN 122018908ACN-122018908-A

Abstract

The invention discloses a binary-rewriting-based cross-ISA (instruction-standard architecture) extension program translation method, which comprises the steps of 1) enabling an instruction set ISA of a program to be translated to be called as an original ISA, enabling an instruction set ISA on target hardware used for running an original ISA extension to be called as a target ISA, 2) comparing and analyzing a binary file corresponding to an original ISA extension compilation and a binary file corresponding to a target ISA extension compilation to obtain an instruction which needs to be translated relative to the target ISA extension in the original ISA extension and divide the instruction into a plurality of program basic blocks which need to be translated, 3) distributing a jump instruction and an address space used for translating and storing translation codes for each program basic block, 4) distributing the program basic blocks into the corresponding address space through the jump instruction to conduct static binary translation to generate a static file, and 5) distributing loading addresses for the static file according to address distribution information in the static file by utilizing a loader and loading the static file to corresponding positions.

Inventors

  • HE JIATAI
  • QI JI
  • YU JIAGENG
  • WU YANJUN

Assignees

  • 中国科学院软件研究所

Dates

Publication Date
20260512
Application Date
20250731

Claims (5)

  1. 1. A cross ISA extended program translation method based on binary rewriting includes the steps: 1) The method comprises the steps of referring to an instruction set ISA of a program to be translated as an original ISA, referring to an extension of the original ISA as an original ISA extension, referring to an instruction set ISA on target hardware used for running the original ISA extension as a target ISA, and referring to the extension of the target ISA as a target ISA extension; 2) Comparing and analyzing the binary file corresponding to the original ISA extension compilation with the binary file corresponding to the target ISA extension compilation to obtain an instruction which needs to be translated relative to the target ISA extension in the original ISA extension, and dividing the instruction into a plurality of program basic blocks which need to be translated; 3) Allocating a jump instruction and an address space for translating and storing translation codes for each of said program basic blocks; 4) Distributing the basic blocks of the program into corresponding address spaces through jump instructions to perform static binary translation, and generating static files; 5) And distributing loading addresses for the static files according to the address distribution information in the static files by using a loader, and loading the loading addresses to corresponding positions, so that the virtual addresses of the static files are prevented from colliding with other runtime modules on the target hardware.
  2. 2. The method of claim 1, wherein the allocation of a jump instruction and an address space for each program basic block is performed by determining the jump instruction for each program basic block based on target ISA characteristics, using a single jump instruction in the ISA to directly replace the translated instruction if the jump instruction for the program basic block is a short jump instruction, and using a compression instruction set to compress the long jump instruction if the jump instruction for the program basic block is a long jump instruction.
  3. 3. The method of claim 2, wherein the compressed instruction set includes, but is not limited to RISC-V C extensions and Arm thumb.
  4. 4. The method of claim 1, wherein the method of modifying the error handling module in the core of the target hardware is that the core captures the instruction generated by the untranslated code without identifying an error and translates it at runtime and then resumes program execution, and for the unrecognized error jumps, the core first maintains a recovery table at the location of each candidate error jump when loading the binary file, and resumes execution of the execution stream to the location of the candidate error jump when the candidate error jump occurs.
  5. 5. The method of claim 1, wherein the error handling module in the core of the target hardware is modified to implement a runtime translation and runtime error code relocation mechanism for handling edge conditions caused by unrecognized instructions and jump instructions.

Description

Binary-rewriting-based cross-ISA (industry standard architecture) extension program translation method Technical Field The invention belongs to the technical field of computer software, and relates to a cross-ISA (industry standard architecture) extension program translation method based on binary overwrite. Background Binary rewrite refers to the generation of new binary files based on the original binary program based on the mode including patch or regeneration, cross-ISA expansion translation refers to the generation of binary programs through binary rewrite, which can be run on hardware with ISA expansion inconsistent with the original ISA, wherein ISA expansion refers to the addition of additional instructions and hardware design for scenes including energy consumption, acceleration calculation and the like on the original ISA set. CPUs with ISA extensions have proven to have significant advantages in performance, power consumption, and parallelism, but often suffer from cross-ISA execution issues. The extensions of the commercialized instruction set (e.g., x86, arm, etc.) are fixed (as determined by the publisher), and for these problems, the existing ISAX heterogeneous computing system solutions are custom-adapted for different extensions. However, the scheme is not feasible in RISC-V any more, and the specific reason is that the RISC-V open source characteristic enables a large number of custom extensions, even the infrastructure designed for RISC-V hardware extension is developed, and the development flow of RISC-V hardware extension is accelerated. Such extension modes mean extension explosion growth, whereas previous custom ISA extension system design modes have failed to achieve custom development of system and compilation tool chains for each extension. Existing translation work has not been designed for ISA extensions, but the facing goal is mainly cross-ISA translations, such as ARM, inter-translation between x86, which are typically classified as static and dynamic translations. The static translation separates instruction translation from a code execution process, and takes an input source program as a unit, so that the instruction translation is finished offline, the generated code is fully optimized, and the execution efficiency is high. However, the static translation cannot acquire complete control flow information of the program in advance, so that the problems of code mining, code repositioning and the like are not sufficiently processed by the static translation, uncertainty and performance cost of the translation are caused, and the static translation is very limited in use scene. The dynamic translation adopts a just-in-time compiling strategy which is executed while translating, the translation and the execution are carried out in units of basic blocks or functional bodies, and when the untranslated code is encountered, the control flow is switched to the translator for translation. Dynamic translation overcomes the defect that static translation cannot acquire complete control flow information, but the runtime translation of dynamic translation brings about huge expense. Binary overwrite optimizes the same ISA binary file again, enhancing its performance, security, aiding debug and performance analysis, etc. The current binary overwrite operations fall into two main categories, binary promotion (binary lifting) and binary patching (binary patching). Binary promotion refers to the translation of assembly code obtained by binary analysis into a specific Intermediate Representation (IR), and then the IR is recompiled from the IR into a binary file according to the binary rewriting requirements (security, performance, debug, etc.), thereby completing binary rewriting. The binary patching uses a jump instruction to replace some instructions and relocates the jump instruction to the rewritten binary code, thereby modifying the execution logic of the program and achieving the aim of binary rewriting. However, the above work cannot meet the requirement of cross-ISA extension translation, firstly, the binary translation work and the binary promotion work need control flow analysis, but the existing control flow analysis is incomplete, which can cause the translated work semantics to change, which is not acceptable in the context of ISA extension translation, secondly, the existing binary patching work has strong architecture correlation (aiming at x 86) and restriction of usage scenario (binary code segment size), is not suitable for the context of ISA extension translation, and therefore, a cross-ISA extension program translation technology based on binary rewriting needs to be designed. Disclosure of Invention Aiming at the problems existing in the prior art, the invention aims to provide a cross ISA extended program translation method based on binary rewriting, which is used for converting a program into a version which can be run on hardware inconsistent with the original ISA extensio