CN-121996567-A - Large language model safety test data processing method and system

CN121996567ACN 121996567 ACN121996567 ACN 121996567ACN-121996567-A

Abstract

The invention discloses a large language model safety test data processing method and a system, which relate to the technical field of memory management, wherein the method comprises the steps of obtaining test data in a large language model safety test platform, wherein the test data comprises multi-mode data, structured attack load data, long-acting session state data and short-time temporary data; the method comprises the steps of carrying out area division on a main memory space according to test data to obtain a plurality of initial functional areas, constructing a corresponding area memory management unit in each initial functional area, wherein the area memory management unit is used for carrying out memory allocation according to a memory allocation strategy, and after the memory allocation, distributing the test data into the corresponding initial functional areas for test data processing. The invention can combine the functional area and the memory allocation strategy to perform memory allocation, so that the test data has enough memory space, thereby realizing test data processing and improving the memory management reliability and the data processing efficiency.

Inventors

BAI YINGDONG
WANG ZIHAO
LI WEIZHU
XU MENG
LIU WEI

Assignees

北京灵云数科信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260112

Claims (10)

1. The large language model safety test data processing method is characterized by comprising the following steps: Test data in a large language model safety test platform is obtained, wherein the test data comprises multi-mode data, structured attack load data, long-acting session state data and short-time temporary data; According to the test data, carrying out region division on the main memory space to obtain a plurality of initial functional regions, wherein the initial functional regions comprise a multi-mode region, a structured attack load region, a long-acting session state region or a short-time temporary region; Constructing a corresponding regional memory management unit in each initial functional region, wherein the regional memory management unit is used for performing memory allocation according to a memory allocation strategy; after memory allocation, the test data are allocated to the corresponding initial functional areas for test data processing.
2. The method of claim 1, wherein the performing memory allocation according to a memory allocation policy comprises: Selecting one area from the plurality of initial function areas as a target function area; Monitoring the memory utilization rate of the target functional area and the number of available idle memory blocks; if the memory usage rate of the target functional area is greater than a preset usage rate threshold and the number of available idle memory blocks is less than a preset available threshold, identifying an idle functional area from the plurality of initial functional areas, wherein the memory usage rate of the idle functional area is less than the preset usage rate threshold; And adjusting the memory capacity of the target functional area according to the memory allocation strategy and the memory capacity of the idle functional area.
3. The method of claim 2, wherein monitoring the memory usage of the target functional area and the number of available free memory blocks comprises: Acquiring a memory allocation event and a memory release event of the target functional area; Updating the memory use state of the target functional area according to the memory allocation event and the memory release event; acquiring a memory allocation event and a memory recovery event of a host machine by calling an interface of a host machine virtualization layer; And correlating the memory use state with a memory allocation event and a memory recovery event of the host machine to obtain the memory use rate and the number of available idle memory blocks of the target functional area.
4. The method of claim 2, wherein the identifying a free functional area from the plurality of initial functional areas comprises: Establishing a corresponding memory demand prediction unit in each initial functional area, wherein the memory demand prediction unit is used for predicting a memory demand trend according to a historical memory use mode of the initial functional area; Establishing a corresponding memory borrowing capability evaluation unit in each initial functional area, wherein the memory borrowing capability evaluation unit is used for calculating the borrowable memory size according to the total current idle memory amount, the maximum continuous idle block size and the memory demand trend of the initial functional area; Sorting a plurality of initial functional areas according to the memory demand trend and the borrowable memory size to obtain a functional area sorting result; and taking the function area with the forefront sequence in the function area sequencing result as the idle function area.
5. The method of claim 2, wherein the adjusting the memory capacity of the target functional area according to the memory allocation policy and the memory capacity of the free functional area comprises: inquiring the memory block locking state of the idle functional area from a host virtualization layer; If the memory block locking state is locked, sending an unlocking request to a host machine virtualization layer, wherein the host machine virtualization layer is used for unlocking the memory of the idle functional area according to the unlocking request; and after the memory is unlocked, the memory blocks in the idle functional area are distributed to the target functional area according to the memory distribution strategy and the memory capacity of the idle functional area.
6. The method of claim 1, wherein the performing memory allocation according to a memory allocation policy comprises: acquiring a current memory access mode; Constructing a corresponding memory access mode monitor in each initial functional area, wherein the memory access mode monitor is used for recording memory access parameters in the initial functional areas, and the memory access parameters comprise a memory allocation request size, a memory allocation request frequency and a memory release mode; evaluating the comprehensive matching score of the memory allocation strategy and the current memory access mode; if the comprehensive matching score is smaller than a preset matching threshold, updating the memory allocation strategy according to the memory access parameter and the memory access strategy set; and performing memory allocation according to the updated memory allocation strategy.
7. The method of claim 6, wherein evaluating the overall match score of the memory allocation policy and the current memory access pattern comprises: The method comprises the steps of obtaining a differentiation requirement corresponding to each test data, wherein the differentiation requirement comprises memory continuity, data life cycle and access frequency; Setting a memory allocation strategy optimization target corresponding to each test data according to the current memory access mode and the differentiation requirements; According to the memory access parameters, evaluating the optimization score of the memory allocation strategy to the memory allocation strategy optimization target; And calculating the comprehensive matching score according to the optimization score and the test load duty ratio corresponding to each test data.
8. The method of claim 7, wherein calculating the composite matching score based on the optimization score and the test load duty cycle for each test data comprises: Acquiring real-time memory request quantity and processing time length corresponding to each test data; According to the real-time memory request quantity and the processing time length, evaluating a memory load fluctuation range; Determining a memory weight factor according to the test load duty ratio, the memory load fluctuation range, the real-time memory request amount and the processing time length; And carrying out weighted fusion on the optimization scores corresponding to each test data according to the memory weight factors corresponding to each test data to obtain the comprehensive matching score.
9. The method of claim 1, wherein after dividing the main memory space into a plurality of initial functional areas according to the test data, the method further comprises: Acquiring a plurality of physically continuous memory page groups according to the multi-mode data, the fragmentation degree of the current system physical memory and the multi-mode area; According to the target hardware acceleration unit, an input/output memory management unit and a direct memory access control unit are configured; creating a single and logically continuous direct memory access virtual address space according to the plurality of physically continuous memory page groups; When data storage is carried out, the multi-mode data is stored into the direct memory access virtual address space through the input/output memory management unit; And when data access is performed, reading the multi-mode data from the direct memory access virtual address space through the direct memory access control unit.
10. A large language model security test data processing system, comprising: The data acquisition module is used for acquiring test data in the large language model security test platform, wherein the test data comprises multi-mode data, structured attack load data, long-acting session state data and short-time temporary data; The area division module is used for carrying out area division on the main memory space according to the test data to obtain a plurality of initial functional areas, wherein the initial functional areas comprise a multi-mode area, a structured attack load area, a long-acting session state area or a short-time temporary area; The memory allocation module is used for constructing a corresponding regional memory management unit in each initial functional region, and the regional memory management unit is used for performing memory allocation according to a memory allocation strategy; and the data distribution module is used for distributing the test data to the corresponding initial functional area after the memory is distributed, and performing test data processing.

Description

Large language model safety test data processing method and system Technical Field The invention relates to the technical field of memory management, in particular to a large language model security test data processing method and system. Background In the security assessment and deployment of Large Language Models (LLMs), a security test platform needs to process massive complex test data to explore model vulnerabilities. The conventional method performs memory allocation by simply identifying free memory space. However, test data has been converted from a single text to a complex form containing multimodal information, structured instruction sequences, and resistance samples, which is highly demanding for memory continuity, while multiple test sessions need to run in parallel and maintain dynamically growing context information that needs to reside in memory for a long period of time in order to simulate a real attack scenario. This highly heterogeneous memory request pattern (large size difference, non-uniform lifecycle, high continuity requirements) makes conventional memory management mechanisms difficult to handle, and is prone to severe external fragmentation. Even if the total memory of the system is sufficient, the lack of a large enough continuous memory block may cause failure of key data allocation, interrupt the test flow, seriously affect the evaluation efficiency, have low memory management reliability and low data processing efficiency. In summary, the technical problems in the related art are to be improved. Disclosure of Invention The embodiment of the invention mainly aims to provide a large language model safety test data processing method and system, which can be used for carrying out memory allocation by combining a functional area and a memory allocation strategy, so that test data has enough memory space, thereby realizing test data processing and improving memory management reliability and data processing efficiency. In one aspect, the embodiment of the invention provides a large language model security test data processing method, which comprises the following steps: Test data in a large language model safety test platform is obtained, wherein the test data comprises multi-mode data, structured attack load data, long-acting session state data and short-time temporary data; According to the test data, carrying out region division on the main memory space to obtain a plurality of initial functional regions, wherein the initial functional regions comprise a multi-mode region, a structured attack load region, a long-acting session state region or a short-time temporary region; Constructing a corresponding regional memory management unit in each initial functional region, wherein the regional memory management unit is used for performing memory allocation according to a memory allocation strategy; after memory allocation, the test data are allocated to the corresponding initial functional areas for test data processing. In another aspect, an embodiment of the present invention provides a large language model security test data processing system, including: The data acquisition module is used for acquiring test data in the large language model security test platform, wherein the test data comprises multi-mode data, structured attack load data, long-acting session state data and short-time temporary data; The area division module is used for carrying out area division on the main memory space according to the test data to obtain a plurality of initial functional areas, wherein the initial functional areas comprise a multi-mode area, a structured attack load area, a long-acting session state area or a short-time temporary area; The memory allocation module is used for constructing a corresponding regional memory management unit in each initial functional region, and the regional memory management unit is used for performing memory allocation according to a memory allocation strategy; and the data distribution module is used for distributing the test data to the corresponding initial functional area after the memory is distributed, and performing test data processing. The embodiment of the application has the advantages that firstly, the test data in the large language model security test platform is obtained, then, the main memory space is divided into a plurality of initial functional areas according to the test data, then, a corresponding area memory management unit is constructed in each initial functional area, and memory allocation is carried out through the area memory management unit according to a memory allocation strategy, finally, the test data is allocated to the corresponding initial functional area for test data processing, so that the memory allocation can be carried out by combining the functional area and the memory allocation strategy, the test data has enough memory space, the test data processing is realized, and the memory management reliability and the data processing