CN-116685980-B - Training interpretable deep learning models using disentanglement learning
Abstract
A method and system for training an interpretable deep learning model includes receiving an input data set, which may be complex. The input data set is provided to the deep learning model for feature extraction. In an exemplary embodiment, the deep learning model generates a de-entanglement potential space for the features from the feature extraction. Features may include semantically meaningful data that is then provided to a low complexity learning model. The low complexity learning model generates output based on specified tasks (e.g., classification or regression). As a low complexity learning model, it is believed that the data output from the deep learning model is inherently interpretable.
Inventors
- S. Chakraboti
- CALO SERAPHIN B.
- WEN JIAWEI
Assignees
- 国际商业机器公司
Dates
- Publication Date
- 20260512
- Application Date
- 20211015
- Priority Date
- 20201223
Claims (20)
- 1. A method of training an interpretable deep learning model for a machine learning system, comprising: Receiving an input dataset, the input dataset comprising an image; providing the input data set to a deep neural network model; extracting features from the deep neural network model; generating a potential space comprising the extracted vector of features; feeding the potential space of vectors to a task specific model, the task specific model being a low complexity learning model, and An interpretable prediction of feature dimensions is generated from the task-specific model.
- 2. The method of claim 1, wherein the features are extracted using an encoder module.
- 3. The method of claim 1, wherein the potential space of vectors is an disentangled representation of the input dataset.
- 4. The method of claim 1, further comprising extracting the features from the deep neural network using a beta variational self-encoder.
- 5. The method of claim 1, further comprising: associating feature dimensions for each vector in the potential space with semantically meaningful characteristics, and The interpretable prediction of feature dimensions is generated based on the semantically meaningful characteristics of each vector.
- 6. A computer program product for training an interpretable deep learning model for a machine learning system, the computer program product comprising: The program instructions may be provided in the form of program instructions, the program instructions are for: Receiving an input dataset, the input dataset comprising an image; providing the input data set to a deep neural network model; extracting features from the deep neural network model; generating a potential space comprising the extracted vector of features; feeding the potential space of vectors to a task specific model, the task specific model being a low complexity learning model, and An interpretable prediction of feature dimensions is generated from the task-specific model.
- 7. The computer program product of claim 6, wherein the features are extracted using an encoder module.
- 8. The computer program product of claim 6, wherein the potential space of vectors is an disentangled representation of the input dataset.
- 9. The computer program product of claim 6, wherein the program instructions are further for extracting the features from the deep neural network using a beta variation self-encoder.
- 10. The computer program product of claim 6, wherein the program instructions are further for: associating feature dimensions for each vector in the potential space with semantically meaningful characteristics, and The interpretable prediction of feature dimensions is generated based on the semantically meaningful characteristics of each vector.
- 11. A computer server, comprising: Network connection; One or more computer-readable storage media; A processor coupled to the network connection and to the one or more computer-readable storage media, and A computer program product comprising program instructions commonly stored on the one or more computer-readable storage media for: Receiving an input dataset, the input dataset comprising an image; providing the input data set to a deep neural network model; extracting features from the deep neural network model; generating a potential space comprising the extracted vector of features; feeding the potential space of vectors to a task specific model, the task specific model being a low complexity learning model, and An interpretable prediction of feature dimensions is generated from the task-specific model.
- 12. The computer server of claim 11, wherein the features are extracted using an encoder module.
- 13. The computer server of claim 11, wherein the potential space of vectors is an disentangled representation of the input dataset.
- 14. The computer server of claim 11, wherein the program instructions are further for extracting the features from the deep neural network using a beta variation self-encoder.
- 15. The computer server of claim 11, wherein the program instructions are further for: associating feature dimensions for each vector in the potential space with semantically meaningful characteristics, and The interpretable prediction of feature dimensions is generated based on the semantically meaningful characteristics of each vector.
- 16. A method of training an interpretable deep learning model for a machine learning system, comprising: Receiving an input dataset, the input dataset comprising an image; Providing the input data set to a beta variation self-encoder; Generating, by the beta variation self-encoder, an output representation of the input dataset; processing the output representation using a low complexity learning model; Determining a task-specific output data set from the low complexity learning model, and An interpretation of the input data set is provided based on the task-specific output data set.
- 17. The method of claim 16, wherein the output representation of the input dataset generated by the beta variation self-encoder is a potential space of dimensional vectors organized by features having semantic relationships.
- 18. The method of claim 16, further comprising: Reconstructing the input data set using a decoder module; Determining a reconstruction error loss from reconstructing the input data set; determining a classification loss or a regression loss from the task-specific output data set, and Training the beta variation auto-encoder, the decoder module, and the low complexity learning model using a combination of the reconstruction error loss and the classification loss or the regression loss.
- 19. The method of claim 16, wherein the low complexity learning model is one of: a parametric model, a non-parametric model, a decision tree, a regression tree, or an integrated model.
- 20. A computer program product for training an interpretable deep learning model for an artificial intelligence computing system, the computer program product comprising: The program instructions may be provided in the form of program instructions, the program instructions are for: Receiving an input dataset, the input dataset comprising an image; Providing the input data set to a beta variation self-encoder; Generating, by the beta variation self-encoder, an output representation of the input dataset; processing the output representation using a low complexity learning model; Determining a task-specific output data set from the low complexity learning model, and An interpretation of the input data set is provided based on the task-specific output data set.
Description
Training interpretable deep learning models using disentanglement learning Technical Field The present disclosure relates generally to data processing, and more particularly to a system and method for training an interpretable deep learning model using disentanglement learning. Background Neural networks are generally considered to be techniques that mimic the operation of a living brain. The artificial network simulates the decision layer to perform the specified tasks. Tasks include, for example, identification and classification of features. The layers may include an input layer, an output layer, and at least one hidden layer therebetween. Each layer performs a particular type of sorting and ordering in the process, some of which are referred to as "feature hierarchies. For a better understanding of the features of the present disclosure, it may be helpful to discuss what is known about deep neural networks. Deep neural networks may be used to process unlabeled or unstructured data. Deep learning represents a form of machine learning in which techniques that use aspects of artificial intelligence seek to classify and rank information in a manner that overrides simple input/output protocols. Deep neural networks extract data representations that are often difficult or overly time consuming for humans to interpret. Meaningful representations of data from complex data sets can be provided with minimal user intervention. Much of how deep a neural network operates remains unknown and undissolved. In general, the deep neural network may not be given rules or conditions to follow when performing tasks. Deep learning is useful for its performance provided by minimal user intervention when processing large batches of data. The industry is currently striving to understand and explain (explain) how deep neural networks behave so that modeling can be improved. An interpretability (or explanatory) relates to a task being performed. This means, for example, that for an input image classified as a "dog", the specification from the model indicates why or which features of the input image are most responsible for classification. Therefore, an attempt is made to illustrate a classification (or regression) model. Traditionally, models were trained for specific tasks. The model extracts the desired features from the input and predicts the output. If the model archives its performance is impacted on difficult datasets. Alternatively, if a complex depth architecture is used, the model may learn difficult decision boundaries and perform well. However, simple models are interpretable, while complex depth models are not. Selecting one type of model over another type of model requires undesirable trade-offs. Simple models are interpretable but perform poorly, while depth models are not interpretable but provide very good performance. Some current methods use, for example, an interpreter module to provide interpretability of complex depth models. The interpreter module is typically separate from the deep learning model. For example, the declaration module views the data model and the image and generates the declaration from outside the learning model. The description may highlight input features whose presence (and absence) is most important to the decision of the model. However, the description is considered a guess of the interpreter and not necessarily a true description of how the learning model reaches its output. Other methods may include using a proxy model that provides localized descriptions around data points. However, the output from the proxy model may also be based on inference and not necessarily an accurate depiction of the decisions of the learning model. The proxy model uses different features than the original neural network and only specific examples are illustrated. In addition, the proxy model itself may not help to account for the global model. The proxy model is typically limited to a small region that accounts for decision boundaries near a given test data point. It can be seen that there remains a challenge to find a way to better illustrate how deep learning models operate to perfect and improve aspects of their training. Disclosure of Invention In accordance with an embodiment of the present disclosure, a method of training an interpretable deep learning model for a machine learning system is provided. The method includes receiving an input data set. The input data set is provided to a deep neural network model. Features are extracted from the deep neural network model. A potential space is generated that includes the vector of extracted features. The potential space of vectors is fed to a task specific model. In addition, interpretable predictions of feature dimensions are generated from the task-specific model. In one embodiment of the method, an encoder module is used to extract the features. According to another embodiment of the present disclosure, a computer program product for training an interpre