CN-122018869-A - Compiler-free embedded program construction system, method and device based on large language model

CN122018869ACN 122018869 ACN122018869 ACN 122018869ACN-122018869-A

Abstract

The invention discloses a compiler-free embedded program construction system, a method and a device based on a large language model, wherein the method utilizes the large language model enhanced by instruction set architecture knowledge to receive structured hardware demand description, and directly infers and generates an assembly instruction sequence aiming at specific target hardware; and mapping the assembly instruction into a binary machine code, executing in a target hardware or simulation environment, capturing input and output logs and state data in real time when running, carrying out semantic comparison with a preset acceptance criterion, constructing a feedback prompt word containing difference characteristics when the comparison fails, and driving a model to carry out targeted logic correction until the acceptance is passed. The invention completely eliminates the compiling step of converting the high-level language into the intermediate code in the construction process, and realizes the code feedback correction. The method can reduce the code volume by more than 90% on the premise of ensuring the logic consistency, and is particularly suitable for the development of the embedded system with limited resources.

Inventors

WANG YONG
SHI ZHIGUO
ZHOU CHENGWEI
YANG QIANQIAN
CHEN JIMING

Assignees

浙江大学

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. A large language model-based compiler-less embedded program build system, the system comprising: The demand analysis module is used for acquiring the structured hardware demand description of the target hardware platform containing the hardware specification parameters and the functional logic demands; the nerve assembly engine is used for combining instruction set architecture data and register mapping relation of the target hardware platform, and generating a corresponding assembly instruction sequence in an inference mode based on a pre-trained large language model according to the structural hardware demand description of the target hardware platform; and the transparent transcoder is used for receiving the assembly instruction sequence and generating a binary format file executable by the target hardware platform through mapping conversion of the mnemonics and the machine codes.
2. A large language model based compiler-less embedded program build system according to claim 1 wherein the system does not include a compilation process unit that converts high level programming language source files into intermediate code representation IR during the build process.
3. The large language model based compiler-less embedded program building system according to claim 1, wherein the pre-trained large language model integrates a retrieval enhancement generation RAG mechanism, acquires the accurate register base address and offset of the target hardware platform and the instruction encoding table in real time, and inputs the retrieval result as a context into the large language model.
4. The large language model based compiler-less embedded program building system according to claim 1, wherein the structured hardware requirement description is in JSON or XML format, and the content at least comprises clock frequency constraint, peripheral pin mapping and interrupt priority configuration.
5. The large language model based compiler-less embedded program building system of claim 1, further comprising an assembler instruction revise module for obtaining the operation state data of the binary format file in a hardware or simulation environment, and feeding back the operation state data to the large language model, and revising the assembler instruction sequence by the large language model according to the feedback data.
6. The large language model-based compiler-less embedded program construction system according to claim 1, wherein the assembly instruction modification module is specifically implemented as follows: (1) Converting the assembly instruction sequence into a machine code, loading the machine code into a target hardware environment for operation, and capturing a runtime state log of the target hardware environment in a preset time window; (2) Comparing the running state log with a preset acceptance standard, and calculating semantic difference between an actual running result and an expected result, wherein the semantic difference comprises the occurrence frequency of a specific character string mode in the character stream data by utilizing a regular expression; (3) When the comparison result is not passed, constructing feedback prompt information containing error characteristics according to the semantic difference, wherein the feedback prompt information comprises an expected output mode in an acceptance standard and an actual output mode in a running state log, and combining the expected output mode and the actual output mode to generate a natural language prompt word containing an expected and actual disagreement description, inputting the feedback prompt information into the large language model, driving the feedback prompt information to regenerate an assembly instruction sequence, and returning to the execution step (1) until the comparison is passed or the maximum iteration number is reached.
7. A large language model based compiler-less embedded program construction method based on the system of any one of claims 1-6, the method comprising: (1) Obtaining a structured hardware requirement description of a target hardware platform containing hardware specification parameters and functional logic requirements; (2) Inputting the structured hardware demand description of a target hardware platform into a pre-trained large language model, and reasoning to generate a corresponding assembly instruction sequence by combining instruction set architecture data and register mapping relation of the target hardware platform; (3) And converting the assembly instruction sequence according to the mapping relation between the mnemonic and the machine code to generate a binary format file executable by the target hardware platform.
8. A large language model-based compiler-less embedded program construction apparatus comprising a memory and one or more processors, the memory storing executable code, wherein the processor implements a large language model-based compiler-less embedded program construction method as recited in claim 5 when executing the executable code.
9. A computer-readable storage medium having a program stored thereon, wherein the program, when executed by a processor, implements a large language model-based compiler-less embedded program construction method according to claim 5.
10. A computer program product comprising a computer program which, when executed by a processor, implements a large language model based compiler-less embedded program construction method as claimed in claim 5.

Description

Compiler-free embedded program construction system, method and device based on large language model Technical Field The invention relates to the technical field of computer software engineering and artificial intelligence, in particular to a compiler-free embedded program construction system, method and device based on a large language model. Background The existing embedded development flow follows a lengthy link of "high-level language (C/C++) →compiler front-end→intermediate code (IR) →compiler back-end→assembly code→machine code" for a long time. This mode suffers from the following core drawbacks: 1. Efficiency bottlenecks of compilers-general purpose compilers (e.g., GCC, LLVM) tend to produce redundant instruction sequences for compatibility, which are difficult to optimize extremely for non-standard hardware characteristics (e.g., specific bit operation instructions, hardware accelerators) of a specific Microcontroller (MCU). 2. Semantic gap SEMANTIC GAP. Advanced intents of the developer (such as "accurate delay 10 us") often have difficulty guaranteeing the certainty of the bottom layer after multi-layer compiling and conversion, and repeated disassembly debugging is needed. 3. Tool chain dependence-development environments are complex to build, and heavily rely on IDE and compiler authorizations of specific vendors. And existing embedded development processes face significant challenges in directly generating embedded underlying code (assembly/machine code), including: 1. the "illusion" causes inoperability that AI easily generates code that is grammatically correct but logically incorrect (e.g., number of delayed loops of errors, incorrect register addresses), and the model itself cannot perceive these errors. 2. The limitation of open loop generation is that the existing AI programming tools are mostly 'one-time generation', and the capability of automatic closed loop modification according to the operation result is lacking. The developer must manually intervene to download, debug and feed back, which is inefficient. 3. The debugging threshold is high, and errors of the embedded system are often expressed as abnormal time sequence or unresponsiveness of peripheral equipment, and are difficult to find through static code analysis. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a compiler-free embedded program construction system, method and device based on a large language model. A 'direct neural synthesis (Direct Neural Synthesis)' paradigm is presented that completely eliminates traditional compiler components. Structured hardware requirements are mapped directly into assembler instructions or machine code using a Large Language Model (LLM) enhanced with specific Instruction Set Architecture (ISA) knowledge, and "real feedback" of the hardware is used as an objective basis for correcting AI logic. The invention aims at realizing the following technical scheme that the compiler-free embedded program construction system based on the large language model comprises the following components: The demand analysis module is used for acquiring the structured hardware demand description of the target hardware platform containing the hardware specification parameters and the functional logic demands; the nerve assembly engine is used for combining instruction set architecture data and register mapping relation of the target hardware platform, and generating a corresponding assembly instruction sequence in an inference mode based on a pre-trained large language model according to the structural hardware demand description of the target hardware platform; and the transparent transcoder is used for receiving the assembly instruction sequence and generating a binary format file executable by the target hardware platform through mapping conversion of the mnemonics and the machine codes. Further, the system does not include a compilation processing unit that converts the high-level programming language source file into intermediate code representation IR during construction. Further, the pre-trained large language model integrates a retrieval enhancement generation RAG mechanism, acquires the accurate register base address and offset of a target hardware platform and an instruction coding table in real time, and inputs a retrieval result as a context into the large language model. Further, the structured hardware requirement description adopts JSON or XML format, and the content at least comprises clock frequency constraint, peripheral pin mapping and interrupt priority configuration. The system further comprises an assembly instruction correction module which is used for acquiring the running state data of the binary format file in the hardware or simulation environment, feeding back the running state data to a large language model, and correcting the assembly instruction sequence according to the feedback data by the large language model. Further, the assemb