CN-121724994-B - Image recognition model generation device and method

CN121724994BCN 121724994 BCN121724994 BCN 121724994BCN-121724994-B

Abstract

The invention provides a generating device and a method of an image recognition model, which relate to the technical field of image recognition models, and the invention firstly constructs an initial model, and comprises an anatomical perception module, a symptom finding module and a evidence reasoning module which are connected in sequence; integrating a credibility prediction sub-network in a evidence reasoning module to calculate a credibility score vector of a symptom feature vector, associating three classes of labels of a pixel level, a region level and an image level with a sample image to construct a training data set, adopting a multi-task combined training mode, wherein a total loss function comprises an anatomical segmentation loss, a symptom identification loss, a disease classification loss and a credibility prediction auxiliary loss, and storing model parameters after training is completed to obtain a medical image identification model. The invention can output accurate disease classification results, provide clear diagnosis basis, improve the robustness and the interpretability of model decision through credibility evaluation, and is suitable for scenes such as medical image auxiliary diagnosis.

Inventors

ZHAO ZHENFENG
WANG SHENWEN
ZHANG WEIFENG
WANG CAILAN

Assignees

江苏爱影医疗科技有限公司
河北汇汲科技发展有限公司

Dates

Publication Date: 20260512
Application Date: 20260226

Claims (8)

1. An image recognition model generation apparatus, comprising: The system comprises an initial model construction module, an analysis sensing module, a sign finding module and a evidence reasoning module, wherein the initial model construction module is used for constructing an initial model, and the initial model comprises three modules which are sequentially connected, namely an analysis sensing module, a sign finding module and a evidence reasoning module; The credibility integration module is used for integrating the credibility prediction sub-network in the evidence reasoning module to calculate a credibility score vector of the sign feature vector of each target anatomical region so as to represent the reliability of the sign feature vector in the sample image; The system comprises a multi-class label association module, a training data set, a multi-class label association module and a training data set, wherein each sample image is associated with three classes of labels, namely a pixel-level label, a region-level label and an image-level label; The multi-task combined training module is used for training the initial model by using a training data set, and the total loss function in the training process comprises anatomical segmentation loss, sign recognition loss, disease classification loss and credibility prediction auxiliary loss; The model application and diagnosis module is used for storing parameters of the initial model to obtain a medical image recognition model after training is completed, inputting an image to be recognized into the medical image recognition model, diagnosing the disease category of the image to be recognized according to the probability distribution of target disease classification, and synchronously outputting related diagnosis basis; the anatomy perception module is set as a medical image semantic segmentation network and has the function of classifying each pixel point in an input sample image into a predefined anatomy structure category so as to output an anatomy structure segmentation probability map with the same size as the input image; The symptom discovery module is configured to process one or more specific target anatomical regions, and for each target anatomical region, the symptom discovery module performs the following operations: extracting a single-channel probability map corresponding to a target anatomical region from the anatomical structure segmentation probability map as a probability channel of the target anatomical region, wherein each probability value in the probability channel represents the prediction probability that a corresponding pixel point in an input image belongs to the target anatomical region; The evidence reasoning module comprises a feature fusion network, wherein the feature fusion network inputs feature vectors of all target anatomical regions output by the feature discovery module, and the feature fusion network is used for integrating the feature vectors and outputting a global probability distribution of target disease classification.
2. The apparatus according to claim 1, wherein each pixel of the anatomical segmentation probability map corresponds to a vector representing a probability that the pixel belongs to each anatomical class; The medical image semantic segmentation network adopts an encoder and decoder structure, wherein the encoder consists of a plurality of downsampling layers and convolution layers and is used for gradually extracting a multi-scale feature image of an input sample image and reducing the space size of the multi-scale feature image; The method comprises the steps of carrying out spatial attention weighting on a multi-scale feature map by utilizing a binarization mask map to obtain a weighted multi-scale feature map, carrying out cross-scale fusion and spatial pooling on the weighted multi-scale feature map to generate a depth feature vector of a target anatomical region, inputting the depth feature vector into a fully connected network for analysis to obtain a sign feature vector, wherein each dimension corresponds to a predefined sign related to the target anatomical region, and the numerical value of the sign feature vector represents the existence intensity value of the corresponding sign; And when the feature fusion network calculates the probability distribution of the target disease classification, receiving the credibility score vector of each symptom feature vector output by the credibility prediction sub-network, and converting the credibility score into fusion weight so as to carry out weighted fusion on the corresponding symptom feature vector to obtain the final probability distribution.
3. The apparatus according to claim 2, wherein the reliability prediction sub-network is independent of the feature fusion network, each sub-network takes as input a sign feature vector of a single target anatomical region, and outputs a reliability score vector having the same dimension as the input sign feature vector by processing the full-connected network with a Sigmoid function as an activation layer, wherein the reliability score vector has a score of each dimension corresponding to the reliability of one of the signs in the target anatomical region.
4. A device for generating an image recognition model according to claim 3, wherein the specific labeling process of three types of labels is: Labeling each sample image in the training data set by a labeling person; The pixel-level labeling refers to that each pixel point in the sample image is assigned with a label of an anatomical structure type, a plurality of anatomical regions are divided by the label of the anatomical structure type of each pixel point, and a segmentation labeling diagram with the same space size as the sample image is generated; Based on each anatomical region defined by the segmentation annotation graph, identifying the current target anatomical region to be analyzed, wherein the region-level annotation process of the current target anatomical region to be analyzed is as follows: For the predefined, target anatomical region-related, sign, determining whether it is within the target anatomical region and generating a binary presence tag, 1 for presence, 0 for absence; For the sign judged to exist, a numerical value or a metric value of a classification grade score is generated according to the sign type, the sign type is divided into an objective measurement type and a subjective evaluation type, for the sign of the objective measurement type, numerical value type parameters are calculated at least based on image gray level, morphology or texture characteristics, for the sign of the subjective evaluation type, a labeling person gives at least one ordered classification grade score according to judgment; for the target anatomical region, finally generating a sign labeling vector which is composed of all the existing signs and the measurement values thereof in the target anatomical region; The image-level labeling is then associated with the sign labeling vector of the target anatomical region based on the known disease category of the sample image, and the known disease category of the sample image is used as a disease diagnosis label for the sign labeling vector.
5. The device for generating an image recognition model according to claim 4, wherein the reliability prediction sub-network is constrained by an auxiliary supervision signal when training, the auxiliary supervision signal is derived from self-quality scores of the signs with the binary presence labels of 1 in the regional level labels, and the learning goal of the reliability prediction sub-network is to enable the difference value between the reliability score vector output by the reliability prediction sub-network and the self-quality score of the corresponding sign to be in a deviation range; For the sign with the binary presence label of 1, calculating the self quality score reflecting the labeling reliability, specifically as follows: The labeling process is independently completed by a plurality of labels, the quality score of the label is equal to the proportion of the number of people in existence to the total number of labels in all labels; and carrying out standardized preprocessing operation on all sample images, wherein the preprocessing operation at least comprises the steps of normalizing pixel values to a fixed range, resampling or cutting image sizes to a uniform size.
6. The apparatus for generating an image recognition model according to claim 3, wherein in the training process, each sample image in the training set is inputted into the initial model to obtain an anatomical structure segmentation probability map, a sign feature vector of each target anatomical region, a probability distribution of target disease classification, and a reliability score vector of each sign; calculating loss by taking the anatomical structure segmentation probability map output by the anatomical perception module as a predicted value and taking the segmentation label map generated by pixel-level labeling as a target value; the anatomical segmentation loss consists of two parts, expressed by the formula: Wherein, the Representing anatomical segmentation loss; representing the loss of the Dice similarity coefficient, and optimizing the segmentation coincidence degree of the whole anatomical region; the weight coefficient of the corresponding item; For weighted cross entropy loss; the calculation formula of the Dice similarity coefficient loss is as follows: Wherein, the Representing the probability that the ith pixel in the anatomical structure segmentation probability map belongs to the class c anatomical structure class; a binarization existence tag for representing that the ith pixel in the segmentation label graph belongs to the class c anatomical structure class; The method is characterized by comprising the steps of setting a preset minimum value for preventing denominator from being zero, i is an index of pixels, N is the number of pixels, C is an index of anatomical structure categories, and C is the number of anatomical structure categories; the weighted cross entropy loss is used for optimizing the classification of each pixel and relieving the influence caused by unbalanced pixel quantity of different anatomical structures in the image, and the calculation formula is as follows: Wherein, the Positive weight coefficients assigned to class c anatomical structure categories; And calculating a symptom recognition loss by taking the symptom feature vector output by the symptom finding module as a predicted value and the symptom marking vector of the target anatomical region as a target value for each target anatomical region, wherein the existence and the measurement value of the symptom recognition loss for the symptom are calculated respectively according to the following formula: Wherein, the Representing a symptom recognition loss; Representing the sign existence loss, and calculating the binarized existence labels of all the signs in the sign labeling vector by adopting a binary cross entropy loss function; the weight coefficient of the corresponding item; The method comprises the steps of representing sign metric value loss, calculating only signs with the existence label of 1, calculating by adopting a smooth L1 loss function for the objectively measured signs, grading the subjective evaluation type signs with the metric value of ordered classification grade, and adopting an ordinal regression loss function with weight; The disease classification loss is calculated by adopting a standard multi-class cross entropy loss function, the probability distribution of target disease classification output by an evidence reasoning module is used as a predicted value, the probability distribution of the target disease classification is expressed as a probability distribution vector, each element in the probability distribution vector represents the probability of belonging to the kth class of target disease classification, wherein k represents the index of the target disease classification; the calculation logic of the credibility prediction auxiliary loss is that a credibility score vector of a sign feature vector of each target anatomical region output by a credibility prediction sub-network is taken as a predicted value, and a self quality score of a sign is taken as a target value, so that the credibility prediction auxiliary loss is calculated; the total loss function in the training process is obtained by the weighted summation of the losses, and the calculation formula is as follows: Wherein, the Representing the total loss function value; classifying the loss value for the disease; Predicting an auxiliary loss value for the reliability; the weight coefficients of the respective corresponding terms in the total loss function are respectively.
7. The device for generating an image recognition model according to claim 1, wherein after the initial model meets a preset convergence condition, training of the initial model is considered to be completed, and all parameters of the anatomical perception module, the sign discovery module, the evidence reasoning module and the credibility prediction sub-network are cured and stored to be used as a final medical image recognition model; The medical image recognition model defines a standardized structured output format, and outputs relevant diagnosis basis comprising the following four core part data after the process of analyzing the medical image recognition model for any newly input image to be recognized: based on probability distribution of target disease classification output by feature fusion network processing, taking the disease category corresponding to the highest probability value as the disease category of the current image to be identified to be diagnosed; outputting a target anatomical region list, wherein the list at least comprises names of anatomical structure categories identified by an anatomical perception module, boundaries of a target anatomical region in an image to be identified, all signs with binarization existence labels of 1 in the target anatomical region and metric values corresponding to all the signs; And outputting a credibility score vector of each sign feature vector predicted by the credibility prediction sub-network.
8. A method for generating an image recognition model, characterized in that the method for generating an image recognition model is performed by an image recognition model generating apparatus according to any one of claims 1 to 7, comprising the steps of: The method comprises the steps of 1, constructing an initial model, wherein the initial model comprises three modules, namely an anatomical perception module, a sign finding module and a evidence reasoning module, which are sequentially connected, the anatomical perception module is used for outputting an anatomical structure segmentation probability map of a sample image, the sign finding module is used for generating sign feature vectors for a predefined target anatomical region based on the probability map, and the evidence reasoning module is used for integrating all the sign feature vectors to output probability distribution of target disease classification; integrating a credibility prediction sub-network in the evidence reasoning module to calculate a credibility score vector of the symptom feature vector of each target anatomical region so as to represent the reliability of the symptom feature vector in a sample image thereof; the method comprises the steps of 3, associating three types of labels with each sample image, namely a pixel-level label, a region-level label and an image-level label, wherein the three types of labels are used as input, the three types of labels are used as supervision labels, and the output of the three modules is used as labels to form a training data set; training the initial model by using a training data set, wherein the total loss function in the training process comprises anatomical segmentation loss, sign recognition loss, disease classification loss and credibility prediction auxiliary loss; And 5, after training, saving parameters of the initial model to obtain a medical image recognition model, inputting the image to be recognized into the medical image recognition model, diagnosing the disease category of the image to be recognized according to the probability distribution of the target disease category, and synchronously outputting relevant diagnosis basis.

Description

Image recognition model generation device and method Technical Field The invention relates to the technical field of image recognition models, in particular to a device and a method for generating an image recognition model. Background In the field of medical image analysis, in particular to a computer-aided diagnosis system, how to construct an intelligent model which not only can provide accurate disease diagnosis, but also can give clear and reliable diagnosis basis is always the focus of attention of clinical practice and scientific research. With the rapid development of deep learning technology, the end-to-end classification model based on the convolutional neural network has remarkable effects in the disease screening and classification tasks of various images such as chest X-ray, fundus color Doppler ultrasound, brain MRI and the like. These models typically take the entire image as input, directly outputting the probability of disease. However, such black box models have significant limitations in that firstly, the decision process lacks interpretability, a doctor cannot know which specific areas or pathological symptoms in the image the model makes a judgment, which is unacceptable in serious medical decisions, and secondly, the model lacks robustness assessment capability for noise, artifacts or labeling uncertainty existing in the image, and cannot inform the doctor which evidences are reliable and which are possibly problematic in the diagnosis conclusion thereof, thereby influencing the credibility of clinical application. To enhance the interpretability of the model, the prior art developed a class of methods based on discovery and reasoning paradigms. Such methods typically first locate anatomical regions or visual concepts (i.e., signs) that may be associated with the disease through a segmentation or detection network, and then classify the disease based on these discovered regional features. For example, some methods may segment lung nodules, retinal hemorrhage areas, etc., and then extract features of these areas for judging benign or malignant disease classification. However, these methods still face the technical problem that, first, most methods only focus on image-level classification accuracy, often as an intermediate step, lack of fine modeling and supervision of the existence of the symptoms themselves and their quantitative properties (e.g., size, density, morphology scores), resulting in a semantic gap between the evidence found and the clinical symptoms focused by radiologists. Secondly, existing methods generally assume that all the sign features found by the model have equal reliability, but in practice, the confidence of the extraction of the model to the features of different regions is different due to differences in image quality, anatomical variations or segmentation errors, and neglecting such differences affects the robustness of the final decision. Finally, training of models is highly dependent on high quality labeling data, while medical images are very costly to label and often have inter-observer variability. How to perform efficient joint training with multi-level labeling (pixel-level segmentation, regional level sign description, image-level diagnosis) that may contain uncertainty, and let model learning evaluate the credibility of the evidence extracted by itself is a problem that the prior art has not systematically solved. Therefore, a new model generation method is needed to realize robust reasoning based on evidence credibility under an integrated framework and output a diagnosis report with strong interpretability and reliability self-awareness. The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art. Disclosure of Invention The invention aims to provide a device and a method for generating an image recognition model, which are used for solving the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: An image recognition model generation device, comprising: The system comprises an initial model construction module, an analysis sensing module, a sign finding module and a evidence reasoning module, wherein the initial model construction module is used for constructing an initial model, and the initial model comprises three modules which are sequentially connected, namely an analysis sensing module, a sign finding module and a evidence reasoning module; The credibility integration module is used for integrating the credibility prediction sub-network in the evidence reasoning module to calculate a credibility score vector of the sign feature vector of each target anatomical region so as to represent the reliability of the sign feature vector in the sample im