CN-121640471-B - Interactive question-answering method and system for B-scan image

CN121640471BCN 121640471 BCN121640471 BCN 121640471BCN-121640471-B

Abstract

The invention discloses an interactive question-answering method for a B-scan image, which comprises the steps of obtaining a ground penetrating radar B-scan image containing at least two types of disease targets, analyzing the B-scan image obtained in the step S1 to obtain disease information in the image, generating a question-answer pair according to the disease information, constructing an initial B-scan image interpretation question-answering model, training the initial B-scan image interpretation question-answering model to obtain the B-scan image interpretation question-answering model, and inputting the actually collected ground penetrating radar B-scan image and the question into the B-scan image interpretation question-answering model by a user to obtain a question auxiliary image and a corresponding natural language answer. According to the invention, the B-scan image is analyzed, question-answer pairs are generated as a training set, an initial B-scan image interpretation question-answer model is constructed and optimized, and finally the end-to-end B-scan image inversion and natural language interactive question-answer are realized through the model.

Inventors

LEI WENTAI
WANG YIMING
XU QIGUO
LI MINGZHU
CHEN TIANYU
Pu Meiqin
ZHANG TAO

Assignees

中南大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (8)

1. An interactive question-answering method for a B-scan image, comprising the steps of: s1, acquiring a ground penetrating radar B-scan image containing at least two types of disease targets; s2, analyzing the B-scan image obtained in the step S1 to obtain disease information in the B-scan image, and generating a question-answer pair comprising existence features, quantity features, attribute features and relation features according to the disease information to construct a training set; s3, an initial B-scan image interpretation question-answering model is built, and the model comprises a front-end processing module, a B-scan image interpretation question-answering core sub-network and a rear-end output module; A front-end processing module is constructed based on a multichannel image enhancement strategy and is used for performing front-end processing on the B-scan image of the ground penetrating radar and the question-answer pair so as to construct the input characteristics of a training B-scan image interpretation question-answer core sub-network; constructing a B-scan image interpretation question-answer core sub-network based on a depth feature modeling and fusion mechanism, wherein the B-scan image interpretation question-answer core sub-network is used for extracting and fusing features of input features to obtain analysis image data and answer text data serving as intermediate characterization; Based on the multi-channel feature element-by-element addition and a dictionary decoding mechanism, a back-end output module is constructed and used for respectively reconstructing the analysis image data and the answer text data to generate a predicted question auxiliary image and a predicted natural language answer which are used as final output of an initial B-scan image interpretation question-answer model; S4, training the initial B-scan image interpretation question-answer model by using a training set to obtain a B-scan image interpretation question-answer model; s5, inputting the actually acquired ground penetrating radar B-scan image and the actually acquired question into a B-scan image interpretation question-answering model by a user to obtain a question auxiliary image and a corresponding natural language answer, and completing interactive question-answering aiming at the B-scan image; the front-end processing module in step S3 sequentially executes the following processing steps for the B-scan image: The method comprises the steps of subtracting background data from a B-scan image, performing size adjustment to obtain an adjusted B-scan image, sequentially performing normalization and histogram equalization on the adjusted B-scan image to obtain a normalized B-scan image, and then splicing the adjusted B-scan image with the normalized B-scan image data to generate a two-channel B-scan image; The method comprises the steps of mapping a B-scan image into a dielectric constant image, performing background clutter suppression treatment on the dielectric constant image, separating the dielectric constant image after the background clutter suppression according to the category of disease targets to obtain a separated dielectric constant image only containing cavity disease targets and another separated dielectric constant image only containing crack disease targets; The front-end processing module sequentially executes the following processing steps on the question-answer set: Constructing a dictionary containing common words and phrases in the field of ground penetrating radars, coding questions in a question-answer set according to the dictionary to obtain a question code, and coding answers in the question-answer set according to the dictionary to obtain an answer label code; the two-channel B-scan image, the two-channel dielectric constant image, the question code and the answer label code are input features of a training B-scan image interpretation question-answer core sub-network; the B-scan image interpretation question-answering core sub-network in the step S3 comprises an image coding layer, an image splicing regularization layer, a text coding layer, a text regularization layer, a cross-mode fusion layer and an answer decoding layer; The output of the image coding layer is used as the input of the image splicing regularization layer, the output of the text coding layer is used as the input of the text regularization layer, the output of the image splicing regularization layer and the output of the text regularization layer are used as the input of the cross-modal fusion layer together, and the output of the cross-modal fusion layer is used as the input of the answer decoding layer; the processing procedure of the B-scan image interpretation question-answering core sub-network is as follows: Inputting the two-channel B-scan image into an image coding layer and processing the two-channel B-scan image to obtain an intermediate image feature, a final image feature and a two-channel inversion image; inputting the question code into a text coding layer and processing the question code to obtain a sentence vector; Inputting the second image splicing feature and the regularized sentence vector into a cross-modal fusion layer and processing the same to obtain a fusion feature vector; the B-scan image interpretation question-answer core sub-network outputs two-channel inversion images and answer codes.
2. The interactive question-answering method for B-scan images according to claim 1, wherein the step S2 specifically comprises the steps of: A1. Analyzing the B-scan image to obtain information of disease targets in the image, wherein the information comprises relative dielectric constant, burial depth, position, scale size, shape and quantity of the disease targets; A2. generating a first-level question-answer pair aiming at any one of various disease targets, and inquiring and answering whether the disease target of the current category exists or not; If not, setting answers to all questions of the disease targets in the current category which do not exist as default answers, and checking whether all categories have completed the steps A2 to A5, if so, entering the step A6, otherwise, continuing to circularly execute the steps A2 to A5; A3. Generating a second-level question-answer pair aiming at the existing disease targets of the current category, wherein the second-level question-answer pair is used for inquiring physical attributes of the disease targets, executing corresponding examination according to the inquired questions and generating corresponding answers according to examination results; Judging whether all samples in the current category have completed the traversal of the step A3, if so, entering the step A4, and if not, continuing to circularly execute the step A3; A4. The step A3 is performed on all samples in the current category, and the number of the samples in the current category is counted; A5. If the number of samples in the current category is greater than one, checking the geometric relationship among the disease targets in the samples in the same category aiming at the current category, and generating a fourth-level question-answer pair according to the checking result, wherein the fourth-level question-answer pair is used for inquiring and answering the geometric relationship among the disease targets in the samples in the same category; if the number of samples in the current category is less than or equal to one, setting default answers of geometric relation questions among disease targets in the samples in the same category aiming at the current category; Checking whether all the categories have completed the traversal from the step A2 to the step A5, if so, entering the step A6, otherwise, continuing to circularly execute the step A2 to the step A5; A6. Counting the number of categories according to the condition that all the categories have completed the traversal from the step A2 to the step A5, if the number of the categories is greater than one, generating fifth-level question-answer pairs for inquiring geometric relations among disease targets of different categories, executing corresponding examination according to the questions, and generating corresponding answers according to the examination results; and if the number of the categories of the disease targets is less than or equal to one, setting default answers of geometric relation questions among the disease targets of different categories.
3. The interactive question-answering method for B-scan images according to claim 2, wherein the image coding layer comprises an image encoder, an image decoder, and an image projection module; the image encoder includes first to fifteenth image encoding modules, The first image coding module and the second image coding module have the same structure, and each module sequentially comprises a 3x3 convolution layer, a batch normalization layer and a ReLU activation function layer; the third, sixth, ninth and twelfth image coding modules have the same structure, and each module comprises a residual error module and a cavity space pyramid pooling layer which are sequentially connected in series; The fourth, seventh, tenth and thirteenth image coding modules have the same structure, and each module comprises a maximum pooling layer; a fifteenth image encoding module comprising a void space pyramid pooling layer; the fifth and eighth image coding modules have the same structure, and each module comprises a two-branch pyramid convolution module, a batch normalization layer, a ReLU activation function layer, a multi-scale depth convolution with space selectivity characteristic calibration, a batch normalization layer and a ReLU activation function layer which are sequentially connected in series; The eleventh and fourteenth image coding modules have the same structure, and each module comprises a four-branch pyramid convolution module, a batch normalization layer, a ReLU activation function layer, a multi-scale depth convolution with space selectivity characteristic calibration, a batch normalization layer and a ReLU activation function layer which are sequentially connected in series; the image decoder includes first to seventh image decoding modules, The first, third, fifth and seventh image decoding modules have the same structure, and each module comprises a two-dimensional transposition convolution layer; the second, fourth and sixth image decoding modules have the same structure, and each module comprises a four-branch pyramid convolution module, a batch normalization layer, a ReLU activation function layer, a multi-scale depth convolution with space selectivity characteristic calibration, a batch normalization layer and a ReLU activation function layer which are sequentially connected in series; The first image coding module, the second image coding module, the third image coding module and the fourth image coding module are sequentially connected in series, and the output of the first image coding module is used as the input of the second image coding module; The output of the third image coding module is also used as the input of a fourth image coding module, the fourth to sixth image coding modules are connected in series in sequence, the output of the former image coding module is used as the input of the latter image coding module, the output of the sixth image coding module is connected with the output of the fifth image decoding module and then used as the input of the sixth image decoding module, and the output of the sixth image decoding module is used as the input of the seventh image decoding module; Meanwhile, the output of the sixth image coding module is also used as the input of the seventh image coding module; the seventh to ninth image coding modules are sequentially connected in series, and the output of the former image coding module is used as the input of the latter image coding module; the output of the ninth image coding module is connected with the output of the third image decoding module and then is used as the input of a fourth image decoding module; The output of the ninth image coding module is also used as the input of a tenth image coding module, the tenth to twelfth image coding modules are connected in series in sequence, the output of the former image coding module is used as the input of the latter image coding module, the output of the twelfth image coding module is connected with the output of the first image decoding module and then is used as the input of a second image decoding module, and the output of the second image decoding module is used as the input of a third image decoding module; Meanwhile, the output of the twelfth image coding module is also used as the input of the thirteenth image coding module, the thirteenth to fifteenth image coding modules are connected in series in sequence, the output of the former image coding module is used as the input of the latter image coding module, and the output of the fifteenth image coding module is used as the input of the first image decoding module; the image projection module comprises a first projection sub-module and a second projection sub-module which are sequentially connected in series, wherein the structure of the first projection sub-module is the same as that of the fifth image coding module, and the structure of the second projection sub-module comprises a 1x1 convolution layer and a Sigmoid activation function layer which are sequentially connected in series; The processing procedure of the image coding layer is as follows: Acquiring a two-channel B-scan image, performing feature extraction and step-by-step downsampling through an image coding module, and outputting intermediate image features from a fifteenth image coding module; Inputting the intermediate image features into a first image decoding module, up-sampling the decoded features step by step through the first to seventh image decoding modules, fusing the feature images corresponding to each stage of the encoder with the decoded features through jump connection, and outputting final image features from the seventh image decoding module; inputting the final image characteristics into an image projection module and processing the final image characteristics to obtain a two-channel inversion image corresponding to the size of the two-channel B-scan image; The image coding layer outputs a two-channel inversion image; The image splicing regularization layer comprises a size conversion layer, a splicing layer, a convolution layer and a random inactivation layer; The method comprises the steps of inputting final image features into a size conversion layer for processing to obtain image features with adjusted sizes, splicing the image features with intermediate image features through a splicing layer to obtain first image splicing features, extracting features of the first image splicing features through a convolution layer, and regularizing the first image splicing features through a random inactivation layer to obtain second image splicing features.
4. The interactive question-answering method for B-scan images according to claim 3, wherein the text encoding layer is constructed based on a recurrent neural network architecture Skip-Thoughts, and the text regularization layer is constructed based on a random inactivation layer.
5. The interactive question-answering method for the B-scan image according to claim 4, wherein the cross-modal fusion layer comprises a first linear layer, a second linear layer, a first normalization layer, a second normalization layer, a multi-head attention module, a splicing layer, a dynamic gating module, a gating weighting layer and a feedforward network, wherein the dynamic gating module sequentially comprises a linear layer, an activation function GELU, a linear layer and an activation function Sigmoid according to a processing sequence; the cross-modal fusion layer is processed as follows: acquiring a second image splicing feature, sequentially processing the second image splicing feature through a first linear layer and a first normalization layer to obtain a normalized image mapping feature, acquiring a regularized sentence vector, sequentially processing the second linear layer and the second normalization layer to obtain a normalized text mapping feature with the same dimension as the normalized image mapping feature, inputting the normalized image mapping feature and the normalized text mapping feature into a multi-head attention module to obtain an image-text interaction feature, splicing the image-text interaction feature and the normalized text mapping feature to obtain an image text splicing feature, processing the image text splicing feature through a dynamic gating module to obtain gating weight, inputting the image-text interaction feature, the normalized text mapping feature and the gating weight into a gating weighting layer together to obtain an initial fusion feature, normalizing the initial fusion feature and the normalized text mapping feature through a first residual error to obtain an intermediate fusion feature, processing the intermediate fusion feature through a feedforward network to obtain a feedforward enhancement feature, normalizing the feedforward enhancement feature and the intermediate fusion feature through a second residual error to obtain a fusion feature vector, and outputting the fusion feature vector through the modal fusion layer; the answer decoding layer is constructed by a multi-layer perceptron network.
6. An interactive question-answering method for B-scan images according to claim 5, wherein the training of step S4 comprises the steps of: During training, the following loss function is adopted to train only the B-scan image interpretation question-answering core sub-network: In the formula, As a function of the total loss, As the weight of the material to be weighed, As a function of the answer loss, As the weight of the material to be weighed, In order to invert the loss function, As the weight of the material to be weighed, Is a Dice loss function; In the formula, In order to be of a batch size, For the index number of the sample question-answer pair, The correct class score is a score of the correct class, As to the number of candidate answers, Is the first Question and answer, the first The original scores of the individual candidate answers are used, Index sequence number of candidate answer; In the middle of In order to obtain the number of samples, For the predictive probability vector of the nth sample, Is the true label vector for the nth sample, Is a smoothing factor; In the formula, In order to obtain the number of samples, For the prediction result of the nth sample, Is the true label of the nth sample.
7. The interactive question-answering method according to claim 6, wherein the step S5 comprises the steps of: acquiring an actually acquired B-scan image and a problem, and processing the actually acquired B-scan image and the problem through a front-end processing module to respectively obtain a corresponding two-channel B-scan image and a corresponding problem code; inputting a two-channel B-scan image and a question code into a B-scan image interpretation question-answering core sub-network to process the two-channel B-scan image and the question code to obtain a two-channel inversion image and a question code, wherein the question is any layer of five layers of questions; and (3) inputting the two-channel inversion image and the answer code into a rear-end output module to process the two-channel inversion image and the answer code to obtain a B-scan inversion image and an answer, wherein the B-scan inversion image is a question auxiliary image, and the answer is a natural language answer.
8. The interactive question-answering system for the B-scan image is characterized by comprising a B-scan image acquisition module, a disease information question-answer pair generation module, an initial B-scan image interpretation question-answering model construction module, an initial B-scan image interpretation question-answering model training module and a B-scan image interpretation question-answering model application module, wherein the B-scan image acquisition module, the disease information question-answer pair generation module, the initial B-scan image interpretation question-answering model construction module, the initial B-scan image interpretation question-answering model training module and the B-scan image interpretation question-answering model application module are sequentially connected in series; the acquisition module of the B-scan image is used for acquiring a ground penetrating radar B-scan image containing at least two types of disease targets and uploading data to the disease information question-answer pair generation module; the disease information question-answer pair generation module is used for analyzing the B-scan image acquired in the step S1 according to the received data to acquire disease information in the B-scan image, generating question-answer pairs comprising existence characteristics, quantity characteristics, attribute characteristics and relation characteristics according to the disease information to construct a training set, and uploading the data to the initial B-scan image interpretation question-answer model construction module; The initial B-scan image interpretation question-answering model construction module is used for constructing an initial B-scan image interpretation question-answering model according to received data, and comprises a front-end processing module, a B-scan image interpretation question-answering core sub-network and a rear-end output module; A front-end processing module is constructed based on a multichannel image enhancement strategy and is used for performing front-end processing on the B-scan image of the ground penetrating radar and the question-answer pair so as to construct the input characteristics of a training B-scan image interpretation question-answer core sub-network; constructing a B-scan image interpretation question-answer core sub-network based on a depth feature modeling and fusion mechanism, wherein the B-scan image interpretation question-answer core sub-network is used for extracting and fusing features of input features to obtain analysis image data and answer text data serving as intermediate characterization; Based on the multi-channel feature element-by-element addition and a dictionary decoding mechanism, a back-end output module is constructed and used for respectively reconstructing the analysis image data and the answer text data to generate a predicted question auxiliary image and a predicted natural language answer which are used as final output of an initial B-scan image interpretation question-answer model; the front-end processing module sequentially executes the following processing steps on the B-scan image: The method comprises the steps of subtracting background data from a B-scan image, performing size adjustment to obtain an adjusted B-scan image, sequentially performing normalization and histogram equalization on the adjusted B-scan image to obtain a normalized B-scan image, and then splicing the adjusted B-scan image with the normalized B-scan image data to generate a two-channel B-scan image; The method comprises the steps of mapping a B-scan image into a dielectric constant image, performing background clutter suppression treatment on the dielectric constant image, separating the dielectric constant image after the background clutter suppression according to the category of disease targets to obtain a separated dielectric constant image only containing cavity disease targets and another separated dielectric constant image only containing crack disease targets; The front-end processing module sequentially executes the following processing steps on the question-answer set: Constructing a dictionary containing common words and phrases in the field of ground penetrating radars, coding questions in a question-answer set according to the dictionary to obtain a question code, and coding answers in the question-answer set according to the dictionary to obtain an answer label code; the two-channel B-scan image, the two-channel dielectric constant image, the question code and the answer label code are input features of a training B-scan image interpretation question-answer core sub-network; The B-scan image interpretation question-answering core sub-network comprises an image coding layer, an image splicing regularization layer, a text coding layer, a text regularization layer, a cross-mode fusion layer and an answer decoding layer; The output of the image coding layer is used as the input of the image splicing regularization layer, the output of the text coding layer is used as the input of the text regularization layer, the output of the image splicing regularization layer and the output of the text regularization layer are used as the input of the cross-modal fusion layer together, and the output of the cross-modal fusion layer is used as the input of the answer decoding layer; the processing procedure of the B-scan image interpretation question-answering core sub-network is as follows: Inputting the two-channel B-scan image into an image coding layer and processing the two-channel B-scan image to obtain an intermediate image feature, a final image feature and a two-channel inversion image; inputting the question code into a text coding layer and processing the question code to obtain a sentence vector; Inputting the second image splicing feature and the regularized sentence vector into a cross-modal fusion layer and processing the same to obtain a fusion feature vector; the B-scan image interpretation question-answer core sub-network outputs two-channel inversion images and answer codes, Uploading the data to an initial B-scan image interpretation question-answer model training module; the initial B-scan image interpretation question-answering model training module is used for training the initial B-scan image interpretation question-answering model by utilizing a training set according to the received data to obtain a B-scan image interpretation question-answering model, and uploading the data to the B-scan image interpretation question-answering model application module; The B-scan image interpretation question-answering model application module is used for inputting the actually acquired ground penetrating radar B-scan image and questions into the B-scan image interpretation question-answering model according to the received data to obtain question auxiliary images and corresponding natural language answers, and completing interactive question-answering aiming at the B-scan image.

Description

Interactive question-answering method and system for B-scan image Technical Field The invention relates to the field of ground penetrating radar detection, in particular to an interactive question-answering method and system for a B-scan image. Background Ground Penetrating Radar (GPR) is a nondestructive testing technology for detecting underground targets by using high-frequency electromagnetic waves, and has the advantages of non-destructiveness, high resolution, high efficiency and the like. However, the ground penetrating radar data must be interpreted to be converted into visual geological structure information before it can be applied. The traditional interpretation method mainly relies on manual experience, and target recognition is performed by observing hyperbolic scattering characteristics, reflection intensity and phase change in the B-scan image. The method is low in efficiency and high in subjectivity, and is difficult to cope with the problems of multi-target interference and signal attenuation in a complex geological environment. In recent years, a deep learning technique represented by a U-Net architecture with YOLOvX and encoding and decoding structures improves the efficiency of GPR data interpretation, and can automatically output the positions and classification results of all diseases at one time. However, the output information is difficult to be consistent with the actual engineering requirements, and the model cannot be queried for multiple rounds like a human expert, so that the interpretation result cannot be interactively refined. The visual question-answering (VQA) technique is used as the leading direction of the multi-mode artificial intelligence, so that the model can directly infer and give accurate answers from the images through natural language according to the questions related to the input images, which are proposed by the user. However, existing visual question-answering techniques still have difficulty adapting and processing GPR B-scan images with high noise, low texture, and hyperbolic shape characteristics, and also have limitations in facing professional geologic problems involving quantitative properties such as "quantity", "depth", "size", etc. Disclosure of Invention The invention aims to construct an interactive question-answering method facing to a B-scan image, so that a model can automatically generate and output natural language answers corresponding to the B-scan image based on the B-scan image input by a user and the input questions. The interactive question-answering method for the B-scan image provided by the invention comprises the following steps: s1, acquiring a ground penetrating radar B-scan image containing at least two types of disease targets; s2, analyzing the B-scan image obtained in the step S1 to obtain disease information in the B-scan image, and generating a question-answer pair comprising existence features, quantity features, attribute features and relation features according to the disease information to construct a training set; s3, an initial B-scan image interpretation question-answering model is built, and the model comprises a front-end processing module, a B-scan image interpretation question-answering core sub-network and a rear-end output module; A front-end processing module is constructed based on a multichannel image enhancement strategy and is used for performing front-end processing on the B-scan image of the ground penetrating radar and the question-answer pair so as to construct the input characteristics of a training B-scan image interpretation question-answer core sub-network; constructing a B-scan image interpretation question-answer core sub-network based on a depth feature modeling and fusion mechanism, wherein the B-scan image interpretation question-answer core sub-network is used for extracting and fusing features of input features to obtain analysis image data and answer text data serving as intermediate characterization; Based on the multi-channel feature element-by-element addition and a dictionary decoding mechanism, a back-end output module is constructed and used for respectively reconstructing the analysis image data and the answer text data to generate a predicted question auxiliary image and a predicted natural language answer which are used as final output of an initial B-scan image interpretation question-answer model; S4, training the initial B-scan image interpretation question-answer model by using a training set to obtain a B-scan image interpretation question-answer model; S5, inputting the actually acquired ground penetrating radar B-scan image and the actually acquired question into a B-scan image interpretation question-answering model by the user to obtain a question auxiliary image and a corresponding natural language answer, and completing interactive question-answering aiming at the B-scan image. The step S2 specifically comprises the following steps: A1. Analyzing the B-scan image to obtai