CN-121997385-A - Anti-fault injection AI chip model encryption loading and executing method
Abstract
The invention relates to the field of electric digital data processing and discloses an anti-fault injection AI chip model encryption loading and executing method, which comprises the steps of latching a micro-architecture state vector in real time at an execution stage of a computing unit pipeline, initializing a hardware rolling hash accumulator, forcing a state vector expected value of a previous data processing stage to participate in one-way hash operation together to generate a dynamic mask, encrypting transmission data, and deducing matched decryption mask reduction data by the computing unit only when an actual value of the state vector is consistent with the expected value.
Inventors
- MA XIAONAN
- Sun Tengye
Assignees
- 上海伊世智能科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260129
Claims (10)
- 1. An anti-fault injection AI chip model encryption loading and executing method is applied to a data processing system which connects a security module and a computing unit through a point-to-point data path shielding external interruption, and is characterized by comprising the following steps: Step 101, at each clock cycle effective edge of the instruction pipeline operation executed by the computing unit, latching a signal set containing a physical address of a program counter, a state mark of an arithmetic logic unit and an exclusive or value of a control signal in real time by utilizing a hardware state sampling logic, and cascading the signal set to generate a micro-architecture state vector for objectively recording the current micro-execution history of the computing unit; Step 102, initializing a hardware rolling hash accumulator in a security module, establishing a synchronous stepping mapping relation between the hardware rolling hash accumulator and an instruction execution period of a computing unit, executing an encryption constraint rule based on a historical execution state at a security module side, calling a current value of the hardware rolling hash accumulator and a micro-architecture state vector expected value of an N-1 data processing stage to participate in one-way hash operation together to generate a dynamic mask, and executing exclusive-or encryption on the N-th transmission data by utilizing the dynamic mask; And 103, executing a closed-loop decryption operation based on the real-time micro-architecture state at the side of the computing unit, wherein the operation directly reads the actual value of the micro-architecture state vector presented in the internal physical register when the computing unit finishes the N-1 data processing instruction, and derives a decryption mask by combining the accumulator value maintained by local synchronization to restore the N-th transmission data, and when the computing unit has bit level difference between the actual value of the micro-architecture state vector and the expected value of the micro-architecture state vector due to instruction time sequence interference, the closed-loop decryption operation directly generates a decryption mask which is not matched with the dynamic mask according to a hash operation rule, and the N-th transmission data is calculated and restored into high-entropy noise data which damages the mathematical convergence of the neural network.
- 2. The method of claim 1, wherein the generating the micro-architecture state vector in step 101 includes the sub-steps of extracting zero flag bit, carry flag bit and overflow flag bit of the arithmetic logic unit from the program state word register of the computing unit, and performing bit concatenation on the extracted zero flag bit, carry flag bit and overflow flag bit and low-order bytes of the program counter, collecting the instruction fetch signal, decode signal and execute signal outputted by the instruction decoder, and performing exclusive or compression operation on the collected instruction fetch signal, decode signal and execute signal to generate a control flow fingerprint, and merging the result of bit concatenation with the control flow fingerprint, and writing the merged result into the shadow register at a clock rising edge of each machine cycle to form the micro-architecture state vector, so that the numerical variation of the micro-architecture state vector corresponds exactly to the time sequence advancing state of the instruction pipeline.
- 3. The method of claim 1, wherein the specific operation logic for generating the dynamic mask in step 102 performs mathematical transformation of setting an initial state value of the hardware rolling hash accumulator and performing iterative update according to an accumulated state at a previous time in an atomic operation period for processing each data packet in step 301, step 302, for the first data packet A dynamic mask is calculated according to the following relation : , wherein, Representing a lightweight one-way hash function of a hardware implementation, Representation of process number The hardware scrolls the registered values of the hash accumulator for each packet, Representing pre-stored first The expected value of the microarchitectural state vector that the computing unit should present after the data packets are properly processed, Representing bitwise exclusive or operation, step 303, using the generated dynamic mask For the first The original load of each transmission data packet is subjected to a stream exclusive-or operation to generate a ciphertext data stream to be transmitted through a data path.
- 4. The method of claim 1, wherein the initialization stage before step 102 includes the steps of generating an initial seed by a true random number generator inside the security module and synchronously loading the initial seed to a hardware rolling hash accumulator of the security module and a decryption state register of the computing unit through hard-wired logic, and keeping handshake signals of the security module and the computing unit silent during data loading, and keeping step synchronization of both accumulators only through counting logic of a bus clock to avoid revealing current encryption state information through handshake interaction.
- 5. The method for cryptographic loading and execution of fault injection resistant AI chip model of claim 3, wherein the iterative update mechanism of the hardware rolling hash accumulator comprises step 501, after completion of the first step Immediately after the encryption operation of the transmission data packet, the state of the hardware rolling hash accumulator is updated by the current hash operation result, step 502, the updated accumulator state is temporarily stored in a volatile memory unit with the power-down erasure characteristic, and the state enters the first step And step 503, when detecting a system reset or unexpected power down event, immediately performing a physical clear operation on the volatile storage unit.
- 6. The method of claim 1, wherein the deriving the decryption mask in the closed-loop decryption operation follows the intrinsic verification logic of directly reading the actual value of the micro-architecture state vector from the local pipeline state register and masking any state parameter injection from the software layer by the decryption circuit of the computation unit, using a hashing algorithm completely consistent with the security module and accumulator values maintained in local synchronization, calculating the local decryption mask in conjunction with the read actual value of the micro-architecture state vector, and directly applying the calculated local decryption mask to the received ciphertext data, and restoring the decrypted data to high entropy noise data if the actual value of the micro-architecture state vector is single bit flipped due to a fault injection attack, in step 603.
- 7. The method according to claim 1, further comprising a chain trust anchoring step for the block loading of the model parameters, wherein the method includes a step 701 of dividing the AI model parameters into a plurality of consecutive data blocks and defining a decryption key of each data block depending on an accumulated pipeline state after all the data blocks of the preamble are processed, a step 702 of checking a processing completion flag of the N-1 data block of the preamble when the N-th data block is loaded, and determining that a current microarchitectural state vector of the computing unit is different from an expected value if the processing completion flag is not set or timing advance, and a step 703 of using the chain dependency based on state evolution to prevent tampering with the model structure by replaying the historical data block or skipping the loading of the specific network layer parameters.
- 8. The method of claim 1, wherein the data path construction process performs a physical security reinforcement operation by configuring an on-chip bus interface connecting the security module and the computing unit, shielding all external interrupt request signals and debug port access rights in the bus section except for the reset signal, establishing a point-to-point direct memory access channel between the security module and the computing unit, locking a read-write address space image of the channel, prohibiting any third-party bus master from addressing the address space, and locking a transmission clock frequency of the data path to be consistent with a master frequency of the computing unit, wherein the data transmission clock is kept in strict phase synchronization with an execution clock of the instruction pipeline, in step 803.
- 9. The method for encrypting, loading and executing the AI chip model against the fault injection of claim 1, wherein the method uses parallel hardware architecture to guarantee the real-time performance of the data loading, and comprises the steps of distributing independent pipeline stages in a security module for parallel execution of the search, hash iteration operation and exclusive-or encryption operation of the expected value of the micro-architecture state vector, decomposing the data encryption flow into multi-stage pipeline beats in step 902 to ensure that the logic delay of each beat is smaller than the critical path delay of the system clock period, and carrying the encrypted data in the background by using a special direct memory access controller in step 903 to mask the encryption operation process within the transmission delay of a data bus.
- 10. The method of claim 1, further comprising a passive blocking response step in an abnormal state, wherein the step 1001 does not actively compare the checksum of the decrypted result when the computing unit performs the closed-loop decryption operation, but directly writes the decrypted data into the operation register or the cache, the step 1002 uses the high entropy noise data generated by the erroneous decryption to cause the numerical overflow or the classification confidence dip of the subsequent convolution or the matrix multiplication operation result, and the step 1003 uses the numerical anomaly of the operation layer as the final characterization that the attack has been blocked, and terminates the effective reasoning without triggering the additional interrupt signal.
Description
Anti-fault injection AI chip model encryption loading and executing method Technical Field The invention relates to an AI chip model encryption loading and executing method for resisting fault injection, belonging to the technical field of electric digital data processing. Background In the current high-performance computing and artificial intelligent acceleration chip architecture, the reliability of a system is determined by the safety of a neural network model parameter loading and executing process, the mainstream protection scheme in the industry is based on a cryptography trust chain mechanism, encryption algorithm is utilized to decrypt model parameters in a trusted execution environment or a safe storage area, the data integrity is checked through a hash algorithm, the mechanism depends on strict program execution time sequence, a processor firstly executes decryption and checking instructions, jumps to a data loading instruction after confirming the data integrity, parameters are allowed to flow into a computing pipeline, and a logic threshold controlled by software or firmware instructions is constructed in a data path by protection logic. Aiming at fault injection attack on the physical layer of a chip, based on the defect of a logic threshold protection system exposure architecture, a control flow and a data flow in a general calculation model are logically orthogonal, an instruction execution path is determined by a program counter and a state register, data analysis and operation are executed by an arithmetic logic unit, the program counter and the data analysis and operation are coupled by only a limited condition jump instruction, an attacker utilizes laser injection, voltage burr or electromagnetic pulse physical means to interfere a chip clock or a power supply network, so that the program counter is unexpected to jump or a state register flag bit is overturned, such fault injection enables a processor to skip a key decryption check instruction, directly execute subsequent data loading operation, a data path per se lacks historical perception capability for executing the control flow, and when the control logic is bypassed, the calculation unit still mechanically receives and processes tampered or unauthorized data; overreliance on upper instruction logic ignores binding limitation of a bottom physical state, is not improved at all in pursuit of high concurrency throughput chip application, for example, chinese patent application No. CN112783650B discloses a multi-model parallel reasoning method based on an AI chip, by establishing a master-slave thread cooperative mechanism and multi-queue data buffering, parallel scheduling of Context and Stream resources is realized, computing power utilization rate of an assnd 310 chip is improved, from security defense perspective view, architecture realizes fine task segmentation at a software level, model loading and reasoning execution control right is completely hosted on instruction Stream linear propulsion, a data path depends on software level resource application and ID call, lack of deep entanglement with a hardware microscopic state, once an attacker cannot perceive abnormality of instruction execution history through physical fault injection interference thread scheduling time sequence or skip ACL permission check, sensitive data is still processed according to a tampered program counter, physical security anchor points are missing when pursuing calculation power scheduling, and the control flow and data flow orthogonality urgent requirements are broken through by the verification. Therefore, how to break the logic orthogonality of the control flow and the data flow, build a defense mechanism for internalizing the instruction execution history into the necessary condition of data analysis, cause mathematical self-destruction of the data operation layer by the physical interference to the control time sequence, realize the immunity to the fault injection attack on the premise of not depending on an external detection circuit, and become the technical problem to be solved by the invention. Disclosure of Invention In order to solve the problems in the background technology, the technical scheme of the invention is as follows, an anti-fault injection AI chip model encryption loading and executing method is applied to a data processing system which connects a security module and a computing unit through a point-to-point data path shielding external interruption, and the method comprises the following steps: Step 101, at each clock cycle effective edge of the instruction pipeline operation executed by the computing unit, latching a signal set containing a physical address of a program counter, a state mark of an arithmetic logic unit and an exclusive or value of a control signal in real time by utilizing a hardware state sampling logic, and cascading the signal set to generate a micro-architecture state vector for objectively recording the curren