CN-116975253-B - Visual analysis method and device based on transducer self-attention

CN116975253BCN 116975253 BCN116975253 BCN 116975253BCN-116975253-B

Abstract

The invention discloses a visual analysis method and device based on a transducer self-attention. The invention can be used for knowing the overall distribution condition and statistical rule of the self-attention of the training layer and the attention head of the deep learning model through a visual analysis chart, checking the connection condition of the self-attention in the example through a data link chart and a matrix chart specific training sample, revealing the mutual attention among pixel blocks in the training task through the attention visual analysis in the computer vision field, checking the self-attention distribution condition among different layers and heads through a global normalization mode and a local normalization mode, and obtaining the process of obtaining the result of the downstream task. According to the invention, researchers can intuitively observe the value distribution condition of the attention heads in the transducer model by using a statistical analysis chart, and select the attention heads of interest. And through the visualization of a specific single attention head, researchers can analyze the role played by the attention head in a specific task, and help the researchers to improve the optimization model.

Inventors

YU ZAILIANG
Qiu Yunlei
PAN SHU
WU XIANGYANG
LIU ZHEN
XU GANG
SUN HAIBO
LIN YUHAO
GAO FEI

Assignees

之江实验室
杭州电子科技大学

Dates

Publication Date: 20260512
Application Date: 20230625

Claims (7)

1. A method of visual analysis based on transducer self-attention, the method comprising the steps of: (1) Model training, namely training by a user by utilizing a model built by the user; (2) The data acquisition, wherein a user stores self-attention data and original input data generated in the training process by using the provided api interface; (3) The log writing step of rewriting the model training data stored by the user into a log data format which can be analyzed by the system and feeding the model training data into a visual analysis system to obtain visual analysis results, wherein the visual analysis results are obtained in the visual analysis system specifically as follows: the visual analysis system is used for analyzing data, namely, natural language processing field attention visual analysis and computer vision field attention visual analysis; the visual attention analysis in the natural language processing field comprises three components of a statistical information table, a statistical information graph and an attention visual, wherein the overall distribution condition and the statistical rule of the self-attention of a training layer of a deep learning model and an attention head are known through a visual analysis chart, and meanwhile, the connection condition of the self-attention in an example is checked through a data link graph and a matrix graph specific training sample; the visual analysis of the attention in the computer vision field is used for representing the mutual attention among pixel blocks in a training task, and the attention distribution conditions among different layers and heads are checked in a global normalization mode and a local normalization mode to obtain data of how a downstream task obtains a corresponding result; The visual analysis of attention in the natural language processing field specifically comprises the following steps: (3.1) adopting a statistical information table to graphically display the self-attention overall situation of all training layers and attention heads of the model, and enabling a user to check the self-attention overall situation in sequence through different statistical indexes of a maximum value max, a minimum value min, a quarter bit difference quar and a variance vari; (3.2) simultaneously providing a statistical information diagram to display self-attention information, arranging the self-attention information of each training layer and each attention head in the form of a sunburst diagram, wherein the initial position of the bar diagram codes the self-attention maximum value and the self-attention minimum value characteristics of the attention head, the color coding variance characteristics and a mouse hovers over the corresponding attention head, so that specific information of each attention head can be displayed; (3.3) while the Xueri chart supports multiple filtering operations, autonomously selects different filtering criteria, the user can find attention heads of interest or self-attention abnormality, explore reasons in a single attention head self-attention visual analysis chart; For attention map data attn _map, the dimensions are (L, num_heads, h, w, h, w), wherein L is the number of layers of a model, num_heads is the number of attention heads, h is the height of the attention map, and w is the width of the attention map; The user selects a global normalization mode or a local normalization mode according to the requirement, firstly, the user selects an image to be visually analyzed, then selects the normalization mode, clicks an image block in the image, the system requests attention map data corresponding to coordinates (x, y) of the image block and performs normalization processing, the attention map after normalization processing is changed into a thermodynamic diagram by using a JET algorithm, the place with a larger value is more red to be valued, when global normalization is selected, the user is allowed to slide a proportion r to more clearly check the attention value, after the thermodynamic diagram is multiplied by the proportion r, if the value is larger than 255, the value which is larger than 255 is automatically set as 255 due to the preset threshold value, and finally the thermodynamic diagram is superimposed on the original diagram to show the role of the attention map in the original diagram; (4) And (3) analyzing results, namely checking visual analysis results obtained by a visual analysis system by a user, gradually analyzing model results through multi-dimensional visual analysis charts and interaction linkage with the charts, and exploring a self-attention mechanism in the model.
2. The visual analysis method based on the transducer self-attention according to claim 1, wherein the self-attention calculation process of the single attention head in the step (3.3) is as follows: (3.3.1) encoding the input text into vector form N is the number of input characters and D is the vector dimension; (3.3.2) position coding position encoding information, coded as Adding X and t, and feeding a coding vector X carrying position information into a model coding layer; (3.3.3) in the model coding layer, query vector sequence Q, key vector sequence K, and value vector sequence V by linear transformation: Wherein, the method comprises the steps of, Is a corresponding linear transformation matrix; (3.3.4) in the self-attention model, the self-attention is output as follows using the scaled dot product as an attention scoring function: ; Output from attention vector , The i-th character's self-attention to other characters is input to the color mapping function colorProject to obtain the color mapping score matrix C for the drawing of the data link diagram and the matrix diagram.
3. The visual analysis method based on the transducer self-attention according to claim 2, wherein the data link diagram drawing step is as follows: inputting attention data Z, and calculating a length Len; Calculating the size of a data link diagram, wherein the width is TextBoxWidth, 2+ AttentionWidth, and the height is TextBoxHeight, len; determining the corresponding number relation among the characters by using the attention matrix Z, and calculating a connecting line offset; Determining the color of the corresponding connecting line by utilizing the color mapping score matrix C and combining the corresponding position relation; and drawing a data link graph.
4. The visual analysis method based on the transducer self-attention according to claim 2, wherein the drawing step of the matrix chart is as follows: inputting attention data Z, and calculating a length Len; calculating the size of the matrix diagram, wherein the width is MatrixBox times Len+ TextBoxWidth, and the height is MatrixBox times Len+ TextBoxHeight; Generating a matrix check diagram; the matrix grid is colored with a color score matrix C.
5. The visual analysis method based on the transducer self-attention according to claim 1, wherein the user selects a global normalization mode or a local normalization mode according to requirements, specifically: When the local normalization mode is selected by a user, the algorithm performs normalization operation in each attention map, namely min-max normalization is performed in the LXnum_heads Zhang Zhuyi map so that the value of an image is 0-255, firstly, a temporary matrix temp=A-min (A) is obtained by processing the attention map A, and then normalized image A 'is obtained by A' =temp/max (temp) ×255; When a user selects a global normalization mode, an algorithm can visualize all attention try ranges corresponding to image blocks clicked by a visual analysis picture, the attention values of certain layers and certain heads are observed through the mode, when the global normalization mode is carried out, a temporary matrix temp=A-min (A) is obtained after processing an L×num_heads Zhang Zhuyi force try matrix A corresponding to the clicked image blocks, and then normalized L×num_heads images A 'are obtained after processing A' =temp/max (temp) ×255.
6. An apparatus according to any one of claims 1-5, characterized in that it comprises the following modules: The model training module is used for training by a user by utilizing a model built by the user; the data acquisition module is used for storing self-attention data and original input data generated in the training process by a user through the provided api interface; The log writing module rewrites the model training data stored by the user into a log data format which can be analyzed by the system, and feeds the log data into the visual analysis system to obtain a visual analysis result; And the result analysis module is used for checking visual analysis results obtained by the visual analysis system by a user, gradually analyzing model results through the interaction linkage of the visual analysis chart with the multi-dimension and the chart, and exploring the self-attention mechanism in the model.
7. A computer readable storage medium storing one or more programs executable by one or more processors to perform the steps in the Transformer self-attention based visual analysis method of any one of claims 1-5.

Description

Visual analysis method and device based on transducer self-attention Technical Field The invention relates to the field of computer data visualization, in particular to a visual analysis method and device based on a transducer self-attention. Background In recent years, the rise of a model based on a transducer brings about remarkable performance improvement for many natural language processing tasks and computer vision tasks, in particular to a BERT model in the natural language processing field and a ViT model in the computer vision field, and the most advanced results are achieved on a plurality of tasks. In the field of natural language processing, a pre-trained transducer-based model on a large-scale corpus can be used for fine tuning for various downstream tasks, such as emotion analysis, question-answering and text summarization, and in the field of computer vision, the transducer is used for tasks such as image classification, target detection, semantic segmentation and video understanding, and the model based on Visual Transformer has become a mainstream research direction for processing visual tasks due to the excellent performance of the transducer. However, knowing why these models learn and why they succeed and fail is critical to the researchers developing better models, while being critical to the decision maker's trust in these models, a serious challenge. The development of interactive visualization and visual analysis technology brings new methods for researching the working mechanism of a model for researchers, data generated by model training is analyzed through various visual analysis charts, a user can find out some expression modes in the data, the user can explore and analyze, and the user can learn the internal principle of a complex deep learning model more deeply through linkage analysis among charts. In summary, to help researchers better understand the working principle of the internal self-attention mechanism of the transducer-based model and why success and failure predictions are produced, methods by visual analysis are currently more viable. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a visual analysis method and a visual analysis device based on a transducer self-attention. In order to help researchers better understand the internal self-attention mechanism of a transducer model, the invention designs a plurality of visual analysis methods aiming at a text transducer and an image transducer model, displays the internal connection of self-attention, statistically summarizes the change rule, and deepens the understanding of the interpretability of the model by the researchers. In a first aspect of the present invention, a method for visual analysis based on transducer self-attention, the method comprising the steps of: (1) Model training, namely training by a user by utilizing a model built by the user; (2) The data acquisition, wherein a user stores self-attention data and original input data generated in the training process by using the provided api interface; (3) The log writing step of rewriting model training data stored by a user into a log data format which can be analyzed by the system, and feeding the model training data into a visual analysis system to obtain a visual analysis result; (4) And (3) analyzing results, namely checking visual analysis results obtained by a visual analysis system by a user, gradually analyzing model results through multi-dimensional visual analysis charts and interaction linkage with the charts, and exploring a self-attention mechanism in the model. Further, the visual analysis result obtained in the visual analysis system in the step (3) specifically includes: the visual analysis system is used for analyzing data, namely, natural language processing field attention visual analysis and computer vision field attention visual analysis; the visual attention analysis in the natural language processing field comprises three components of a statistical information table, a statistical information graph and an attention visual, wherein the overall distribution condition and the statistical rule of the self-attention of a training layer of a deep learning model and an attention head are known through a visual analysis chart, and meanwhile, the connection condition of the self-attention in an example is checked through a data link graph and a matrix graph specific training sample; The visual analysis of the attention in the field of computer vision is used for representing the mutual attention among pixel blocks in a training task, and the attention distribution situation between different layers and heads is checked through two modes of global normalization and local normalization to obtain data of how a downstream task obtains a corresponding result. Further, the visual analysis of the attention in the natural language processing field is specifically as follows: (2.1) adopting a statistical informatio