CN-122019791-A - Deep neural network internal neuron semantic interpretation and verification method

CN122019791ACN 122019791 ACN122019791 ACN 122019791ACN-122019791-A

Abstract

The invention relates to a deep neural network internal neuron semantic interpretation and verification method, and belongs to the technical field of artificial intelligence and computer vision. The method comprises the steps of giving a target deep neural network to be explained and a probe data set, carrying out activation distribution analysis on neurons in the network, screening out target neurons with high distinction and corresponding strong activation samples based on the statistical characteristics of activation values, analyzing the strong activation samples by utilizing a clustering algorithm and a pre-trained vision-language model to generate semantic concept assumptions of neuron functions, generating an image model by utilizing texts, generating a verification image set according to the generated semantic concept assumptions, inputting the verification image set into the target deep neural network, and verifying the correctness of the semantic concept assumptions by calculating the activation rate of the neurons on the verification image set. The invention improves the accuracy and the credibility of the interpretability of the deep learning model through a closed loop framework of screening, assumption and verification.

Inventors

XIAO BIN
JI ZEBIN
BI XIULI

Assignees

重庆邮电大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (10)

1. The method for explaining and verifying the semantics of the neurons in the deep neural network is characterized by comprising the following steps of: The method comprises the following steps of S1, giving a target deep neural network and a probe data set to be interpreted, obtaining the activation distribution of target layer neurons in the network on the probe data set, calculating a discrimination index based on the statistical characteristics of the activation distribution, screening target neurons with obvious functional characteristics, and extracting a strong activation sample set corresponding to the target neurons; S2, carrying out feature extraction and cluster analysis on the image area in the strong activated sample set to obtain different functional mode clusters, calculating the cross-modal matching degree of each functional mode cluster and each concept in the predefined concept set by utilizing a pre-trained vision-language model, and generating a semantic concept hypothesis of the target neuron; s3, introducing a generating type verification mechanism, generating an image model by utilizing a pre-trained text, and generating a verification image set independent of the probe data set by taking semantic concept hypothesis as an input prompt word; S4, inputting the verification image set into a target deep neural network, monitoring the response condition of a target neuron to the verification image set, and calculating an activation rate index to quantify the causal consistency of semantic concept hypothesis and the actual function of the neuron, thereby realizing interpretation and verification of the function of the neuron.
2. The method according to claim 1, wherein in step S1, the discrimination index is obtained by calculating a ratio of a high-order median to a medium-order median of activation values of neurons on the probe data set.
3. The method for semantic interpretation and verification of neurons in a deep neural network according to claim 2, wherein in step S1, a target neuron with significant functional characteristics is screened out, and specifically comprises setting a screening threshold β, if the degree of differentiation index is greater than β, determining the target neuron as a neuron with a well-defined function, and taking it into an interpretation range, and if the degree of differentiation index is less than β, regarding it as a low-degree-of-differentiation or redundant neuron, and filtering.
4. The method for interpreting and verifying the semantics of neurons in a deep neural network according to claim 1, wherein in step S2, feature extraction and cluster analysis are performed on image areas in a strong activation sample set, specifically comprising clipping the strong activation sample according to an activation feature map of neurons, reserving a local area with the highest activation value as an image patch, extracting feature vectors of the image patch by using an image encoder, clustering the feature vectors by using a condensation hierarchical clustering algorithm, and automatically determining the number of clusters to obtain a plurality of clusters representing different response modes.
5. The method for semantic interpretation and verification of neurons in a deep neural network according to claim 1, wherein in step S2, a semantic concept hypothesis of a target neuron is generated, specifically by calculating cosine similarity between an image patch and a text concept in a shared embedded space, and selecting the text concept with highest average similarity with a current functional mode cluster as the semantic concept hypothesis of the cluster.
6. The method for interpreting and verifying the semantics of neurons in a deep neural network according to claim 1, wherein in step S3, the text-generated image model is a pre-trained diffusion model or a countermeasure network, and a plurality of synthesized pictures only including the hypothesized features of the semantic concept are generated by inputting the hypothesized features of the semantic concept to the model and sampling different random noises, wherein the following formula is satisfied: Wherein, the The representation is for the first First of neurons A set of verification images generated on the basis of the semantic concept hypotheses; Generating an image model representing the pre-trained text; representing semantic concept hypotheses as input prompt words; Representing a normal distribution from the standard Random noise vector of mid-sampling by sampling different random noise A plurality of mutually independent verification images are generated.
7. The deep neural network internal neuron semantic interpretation and verification method according to claim 6, wherein in step S4, the activation rate index calculating method is as follows: Wherein the sum symbol Σ is for a verification image set All of the images in (a) Performed; representing neurons Hypothesis for semantic concept Is a rate of activation of (2); Representing a total number of images in the set of verification images; Representing a single image in the verification image set; Represent the first Layer i neuron in input image An activation value generated at the time; And 1 {. Cndot. } is an indication function, and the value is 1 when the condition in the bracket is satisfied, and is 0 otherwise.
8. A system for implementing the deep neural network internal neuron semantic interpretation and verification method as claimed in any one of claims 1 to 7, characterized in that the system comprises: the screening module is used for analyzing the activation distribution of the neurons, screening out the neurons with specific functions according to the ratio of the high activation value to the median activation value, and extracting a strong activation sample; the assumption module is used for clustering the strong activation samples and deducing natural language semantic concept assumptions corresponding to each cluster by combining a visual-language model; and the verification module is used for generating a composite image according to the semantic concept hypothesis, verifying the correctness of the hypothesis by detecting the activation degree of the composite image on the neuron, and outputting a final interpretation result.
9. An electronic device comprising a memory and a processor, the memory for storing a computer program configured to be executed by the processor, characterized in that the computer program comprises instructions for performing the method of any of claims 1-7.
10. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a computer, implements the method of any one of claims 1-7.

Description

Deep neural network internal neuron semantic interpretation and verification method Technical Field The invention belongs to the technical field of artificial intelligence and computer vision, relates to an interpretable artificial intelligence (Explainable AI, XAI) technology, and in particular relates to a deep neural network internal neuron semantic interpretation and verification method based on active generation verification. Background In recent years, deep neural networks (Deep Neural Networks, DNNs), particularly Convolutional Neural Networks (CNN) and vision Transformer (ViT), have made breakthrough progress in the fields of computer vision, natural language processing, autopilot, medical image analysis, and the like. These models, by virtue of their deep network structure and massive parameters, are able to automatically learn feature representations from large-scale data, thus achieving performance that even exceeds that of human experts on a specific task. However, this improvement in performance comes at the expense of transparency of the model. Deep neural networks are often considered as a "Black Box" that contains hundreds of millions of neurons and complex nonlinear connections inside, making it difficult for a human user to understand the decisions made by the model on what features are at all. This "end-to-end" decision mode, while efficient, also creates extreme opacity of the model's internal operating mechanisms. Such opacity poses serious trust and Safety hazards in practical applications, particularly in those areas where extremely high Safety requirements (Safety-Critical) are required. For example, in an automatic driving scenario, if the vehicle recognition system misjudges the roadside billboard as a speed limiting sign, the developer needs to know exactly which part of the feature extraction in the network has the deviation, and in medical diagnosis, the doctor must confirm that the AI system makes a judgment based on the pathological features of the focus, not on the "false correlation (Spurious Correlation)" judgment made by the hospital watermark or the equipment noise at the image edge. Therefore, how to open the "black box" of the deep neural network and present its internal decision logic in a human-understandable manner has become a key technical problem to be solved in the field of artificial intelligence. To improve the interpretability of deep learning models, the prior art has developed mainly two directions, one is a saliency map (SALIENCY MAPS) based visualization method, such as Grad-CAM, which highlights the pixel region on the input image that contributes most to the prediction result by computing a gradient or class activation map. However, this approach can only answer "Where the model is looking at (white)", but cannot answer "What the model sees (What)". The user is faced with a highlight area, often requiring subjective guesses in combination with his own experience (e.g., highlight the dog's head, whether attention is paid to the ear, eyes, or hair texture. The second direction is a neuron interpretation method based on semantic concepts, aimed at mapping individual neurons of the neural network middle layer to human-understandable natural language concepts (e.g. "streak detector", "dog head detector"). Early technologies such as Network Dissection rely on pixel-level labeled datasets, are high in cost and limited in vocabulary, and recently developed automatic interpretation methods (such as CLIP-Dissect) based on large-scale visual-language pre-training models (such as CLIP) are free from label dependence, so that richer semantic descriptions can be generated. While these approaches have advanced to some extent the understanding of the internal mechanisms of the model, the current state of the art still suffers from serious logic drawbacks in methodology, principally in three ways: First, the prior art generally has a misregion of "full function hypothesis". I.e. each neuron in the default neural network encodes some meaningful, independent semantic concept. However, theoretical studies of deep learning have shown that "ambiguous neurons (Polysemantic Neurons)" are widely present in the network (i.e., one neuron responds simultaneously to multiple mutually unrelated concepts, such as "cat" and "car" as well as "redundant neurons (Redundant Neurons)" (insubstantial contribution to decision). The existing method lacks an effective screening mechanism, and the explanation of a single label is forced to be carried out on the noise neurons or the multi-sense neurons, so that serious illusion and misleading are generated on the generated explanation, and the internal state of the network cannot be truly reflected. Second, existing interpretation methods are based primarily on "passive Observation" (spurious) of correlation "deep. The prior art deduces its function by counting which input samples activate neurons, but this is essentially a correlation analysis.