CN-121980465-A - AI chip abnormal behavior recognition system based on convolutional neural network
Abstract
The invention discloses an AI chip abnormal behavior recognition system based on a convolutional neural network, which comprises a data acquisition module, a space construction module, a space feature module, a time sequence processing module, an abnormality analysis module and an on-line monitoring module, wherein the data acquisition module is used for acquiring on-chip multi-source behavior data and generating a behavior sequence, the space construction module is used for mapping the behavior sequence into a space matrix and constructing a topological graph, the space feature module is used for inputting the space matrix and the topological graph into an improved type RepLKNet to generate multi-scale space features, the time sequence processing module is used for acquiring space-time features by adopting expansion time convolution, the abnormality analysis module is used for calculating an abnormality score, identifying abnormality category and class and generating an abnormal region activation graph and a semantic vector, and the on-line monitoring module is used for carrying out real-time reasoning and adjusting reasoning frequency according to an abnormality result and implementing light deployment on the model. The system can realize high-precision identification and interpretable analysis of the abnormal behavior of the AI chip, and is suitable for continuous monitoring and operation safety guarantee under complex load conditions.
Inventors
- XU JUAN
Assignees
- 深圳市盛瑞隆科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260225
Claims (8)
- 1. The AI chip abnormal behavior recognition system based on the convolutional neural network is characterized by comprising the following modules: The data acquisition module is used for acquiring on-chip behavior data in the AI chip operation process and generating a behavior sequence; The space construction module is used for mapping the behavior sequence into an on-chip behavior space matrix according to the module layout of the AI chip and constructing a topological graph describing the connection relation of the modules; the space feature module is used for inputting the on-chip behavior space matrix and the topological graph into the improved RepLKNet model to generate a space feature graph; The time sequence processing module is used for forming a time window sequence from the space feature map according to the time sequence, and processing the time window sequence by adopting an expansion time convolution structure to generate space-time features; the abnormality analysis module is used for calculating an abnormality score based on the space-time characteristics, determining abnormality category information and abnormality grade information, and generating an abnormality region activation map and a semantic vector for representing an abnormality mode; And the online monitoring module is used for carrying out online monitoring according to the abnormal recognition result, carrying out real-time reasoning on the time characteristics, adjusting the reasoning frequency according to the abnormal grade information and carrying out lightweight deployment processing of the reasoning stage on the improved RepLKNet model.
- 2. The AI chip abnormal behavior recognition system based on the convolutional neural network of claim 1, wherein the modules are realized by the following method: acquiring on-chip behavior data in the AI chip operation process, and performing time alignment and fusion on the on-chip behavior data to generate a behavior sequence; Mapping the behavior sequence into an on-chip behavior space matrix according to the module layout of the AI chip, and constructing a topological graph describing the connection relation of the modules; Inputting an on-chip behavior space matrix and a topological graph into an improved RepLKNet model, wherein the improved RepLKNet model comprises a multi-scale large convolution kernel structure module, a topological weighting module, a multi-branch convolution structure and a structural re-parameterization module, and generating a space feature graph; The space feature map forms a time window sequence according to time sequence, and the time window sequence is processed by adopting an expansion time convolution structure to generate space-time features; calculating an abnormality score based on the space-time characteristics, determining abnormality category information and abnormality grade information, and generating an abnormality region activation map and a semantic vector for representing an abnormality mode; and carrying out on-line monitoring according to the abnormal recognition result, carrying out real-time reasoning on the time-space characteristics, adjusting the reasoning frequency according to the abnormal grade information, and carrying out lightweight deployment processing in the reasoning stage on the improved RepLKNet model.
- 3. The AI chip abnormal behavior recognition system based on the convolutional neural network of claim 2, wherein the acquiring of the on-chip behavior data in the AI chip operation process performs time alignment and fusion on the on-chip behavior data to generate the behavior sequence, and specifically comprises: the on-chip behavior data comprise instruction execution amount data, cache access data, on-chip link transmission data, calculation array usage amount data and power consumption temperature data, and the on-chip behavior data are recorded according to acquisition time; And carrying out time alignment processing on the on-chip behavior data according to a preset time step by taking the acquisition time record as a reference, and carrying out characteristic dimension splicing or superposition on various on-chip behavior data of the same time step to form a behavior sequence arranged in time sequence.
- 4. The AI chip abnormal behavior recognition system based on the convolutional neural network of claim 2, wherein the mapping of the behavior sequence into the on-chip behavior space matrix according to the module layout of the AI chip and the construction of the topology map describing the module connection relation specifically comprises: dividing the behavior sequence into a plurality of behavior segments according to the module corresponding relation of the AI chip according to the module layout of the AI chip; Mapping the behavior segments in two-dimensional coordinates according to the module layout of the AI chip to generate an on-chip behavior space matrix; Constructing a topological graph according to the connection relation among the modules of the AI chip, wherein the topological graph takes the modules of the AI chip as nodes and takes the data transmission relation, the access relation or the communication path existing among the modules of the AI chip as edges to form a graph structure for representing the module connection relation of the AI chip; and performing one-to-one correspondence between the matrix positions in the on-chip behavior space matrix and the nodes in the topological graph, so that each position in the matrix corresponds to the node representing the same module in the topological graph, and ensuring that the subsequent processing based on the on-chip behavior space matrix can be consistent with the structural relationship in the topological graph.
- 5. The AI chip abnormal behavior recognition system based on a convolutional neural network of claim 2, wherein the improved RepLKNet model comprises a multi-scale large convolutional kernel structure module, a topological weighting module, a multi-branch convolutional structure and a structural re-parameterization module: The multi-scale large convolution kernel structure module executes two-dimensional convolution operation with different convolution kernel sizes on the on-chip behavior space matrix, and the convolution results of the convolution kernel sizes are spliced according to the channel dimension to be used as multi-scale convolution output; The topological weighting module takes the multi-scale convolution output and the topological graph as inputs, generates a topological weighting matrix according to the module connection relation recorded in the topological graph, performs point-by-point multiplication on each spatial position of the multi-scale convolution output and a corresponding element in the topological weighting matrix to obtain a weighted convolution value, performs exponential normalization operation on the weighted convolution values of the same spatial position, and generates a topological weighted output; the multi-branch convolution structure respectively sends the topological weighted output into a plurality of convolution branches, two-dimensional convolution, normalization and nonlinear activation operations are executed in each convolution branch, and convolution results of all convolution branches are spliced according to channel dimensions or added according to corresponding positions to form multi-branch convolution output; The structure re-parameterization module takes multi-branch convolution output and convolution kernel parameters adopted by each convolution branch as input, performs equivalent fusion operation on convolution kernels of the convolution branches, performs element-by-element addition on a plurality of convolution kernels according to corresponding positions, performs summation processing on corresponding offset parameters, and expands and folds a convolution kernel structure in a preset mode to form equivalent large convolution kernels; Performing final convolution processing by taking an on-chip behavior space matrix as input by adopting an equivalent large convolution kernel, wherein the final convolution processing comprises performing convolution kernel window sliding on the on-chip behavior space matrix, performing point-by-point multiplication and addition operation on elements in a window and the equivalent large convolution kernel, and performing normalization and nonlinear activation operation on a convolution result; Sequentially recording the multi-scale convolution output, the topological weighted output, the multi-branch convolution output and the output processed by the equivalent large convolution kernel as intermediate results, and continuously processing the intermediate results to obtain a space feature map.
- 6. The AI chip abnormal behavior recognition system based on the convolutional neural network of claim 2, wherein the processing the time window sequence with the inflated time convolution structure generates a space-time feature, and specifically comprises: Taking the time window sequence as input, performing one-dimensional expansion convolution processing along the time dimension to form a first time sequence feature; taking the first time sequence feature as input, and performing normalization processing on the features of each time step according to the channel dimension to form a normalized time sequence feature; And taking the normalized time sequence characteristic as input, executing nonlinear activation operation on the characteristic value of each time step, generating the activated time sequence characteristic, and organizing the activated time sequence characteristic into time-space characteristics according to time sequence.
- 7. The AI chip anomaly behavior recognition system based on convolutional neural network of claim 2, wherein the calculating of anomaly scores based on spatiotemporal features, determining anomaly category information and anomaly class information, and generating an anomaly region activation map and a semantic vector for representing anomaly patterns, comprises: Performing element-by-element multiplication operation on feature vectors of each time step and space position in the time space feature according to a preset weight vector, summing multiplication results according to channel dimensions to obtain local scores, sequentially adding all the local scores to obtain global scores, and performing linear transformation and normalization processing on the global scores to form an abnormal score sequence; Performing classification operation by taking the abnormal score sequence and the space-time characteristics as inputs, performing matrix multiplication on the time characteristics according to channel dimensions to generate class activation values, performing Softmax function processing on each class activation value to generate probability distribution, determining abnormal class information according to classes corresponding to the maximum probability value, and determining abnormal class information according to the corresponding relation between the probability distribution and a preset threshold value; performing weighted summation processing on the time space features according to space dimensions to form a space activation matrix, performing normalization processing on the space activation matrix, and generating an abnormal region activation graph according to preset value domain mapping; and performing a linear projection method on the time-space features according to the channel dimension to form a semantic vector for representing the abnormal mode.
- 8. The AI chip abnormal behavior recognition system based on the convolutional neural network of claim 2, wherein the AI chip abnormal behavior recognition system performs on-line monitoring according to an abnormal recognition result, performs real-time reasoning on time characteristics, adjusts a reasoning frequency according to abnormal level information, and performs lightweight deployment processing of a reasoning stage on an improved RepLKNet model, and specifically comprises: The abnormal recognition result comprises an abnormal score, abnormal category information, abnormal grade information, an abnormal region activation graph and a semantic vector, and the real-time reasoning processing is carried out on the time space characteristics according to the abnormal recognition result; Comparing the abnormal level information with a preset level threshold item by item, mapping the comparison result into a corresponding risk level label, adjusting the reasoning frequency according to the risk level label, and executing linear scaling operation on a preset reasoning period; Taking the space-time characteristics as input, performing real-time reasoning according to the updated reasoning frequency, performing element-by-element multiplication and addition operation, normalization processing and nonlinear activation operation on the space-time characteristics input each time in time sequence, generating a new abnormal recognition result, and recording the time sequence of each reasoning output as an online monitoring sequence; The lightweight deployment process of the reasoning stage is performed on the improved RepLKNet model, pruning operation is performed on the improved RepLKNet model parameters, quantization process is performed on the weight of the improved RepLKNet model, and the improved RepLKNet model after the lightweight process is used as the use model of the reasoning stage.
Description
AI chip abnormal behavior recognition system based on convolutional neural network Technical Field The invention relates to the technical field of artificial intelligent chip safety monitoring and intelligent diagnosis, in particular to an AI chip abnormal behavior recognition system based on a convolutional neural network. Background Along with the rapid popularization of artificial intelligent chips in data centers, intelligent driving, cloud reasoning and edge equipment, the demands for fine monitoring of the internal running state of the chips and abnormal behavior identification are continuously improved. The existing AI chip anomaly detection scheme mainly relies on the statistical threshold value of an on-chip performance counter, the link bandwidth utilization rate trend or the power consumption temperature curve to carry out simple judgment, and some researches try to model a local behavior segment by using a convolutional neural network. However, in practical application, there are various disadvantages: Firstly, multisource on-chip behavior data comprise multiple types of signals such as instruction execution, cache access, on-chip link transmission, array calculation occupation and the like, sampling frequencies among different signals are inconsistent, and time delay is uncontrollable. In the prior art, a fixed time window or linear interpolation is generally adopted for alignment, so that behavior sequence distortion is easy to cause, and dynamic change in the chip operation process cannot be truly reflected. Secondly, the layout of modules in the AI chip is complex, and clear structural connection relation exists between the modules, but the existing two-dimensional convolution or one-dimensional sequence model cannot fuse the topology information, so that the model lacks inter-module association sensing capability, and the identification accuracy is insufficient under the abnormal structural scenes such as link bottleneck propagation, access conflict and the like. And meanwhile, the existing time sequence model has limited modeling capability on long distance dependence, and characteristic loss or time drift is easy to generate when facing non-uniform time sequence data under high load fluctuation. In addition, the existing anomaly detection method is mostly dependent on single classification output, lacks the positioning capability of an anomaly region and the semantic expression capability of an anomaly mode, cannot meet the diagnosis requirement of a complex chip system, adopts a fixed reasoning frequency in an online monitoring mode, is difficult to dynamically adjust reasoning expenditure according to anomaly risk, and is not suitable for a real-time deployment scene with limited resources. Therefore, how to provide an AI chip abnormal behavior recognition system based on a convolutional neural network is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention aims to provide an AI chip abnormal behavior recognition system based on a convolutional neural network, which comprehensively utilizes on-chip behavior data modeling, space topology mapping, improved RepLKNet space feature extraction, expansion time convolutional time sequence analysis and an on-line lightweight reasoning method, completely builds an intelligent flow from on-chip behavior data acquisition, space representation, space-time feature modeling to abnormal recognition and on-line monitoring, and realizes accurate recognition and interpretable analysis of the type, grade and abnormal region of the AI chip. The invention can maintain high recognition accuracy under complex load and real-time operation conditions, and has the advantages of strong real-time performance, good interpretability, high adaptability, low deployment cost and the like. According to the embodiment of the invention, the AI chip abnormal behavior recognition system based on the convolutional neural network comprises the following modules: The data acquisition module is used for acquiring on-chip behavior data in the AI chip operation process and generating a behavior sequence; The space construction module is used for mapping the behavior sequence into an on-chip behavior space matrix according to the module layout of the AI chip and constructing a topological graph describing the connection relation of the modules; the space feature module is used for inputting the on-chip behavior space matrix and the topological graph into the improved RepLKNet model to generate a space feature graph; The time sequence processing module is used for forming a time window sequence from the space feature map according to the time sequence, and processing the time window sequence by adopting an expansion time convolution structure to generate space-time features; the abnormality analysis module is used for calculating an abnormality score based on the space-time characteristics, determining abnormality category information