EP-4740221-A1 - METHOD AND DEVICE FOR STRATIFYING PATIENTS FOR DIAGNOSIS, PROGNOSIS AND RESPONSE TO TREATMENT
Abstract
The invention relates to a method and a device (200) for grouping a group of patients into at least one subgroup of patients on the basis of medical data relating to the patients of the group and of a predetermined predictive task, the predetermined predictive task being associated with at least one disease which affects at least one of the patients of the group. The method and the device make it possible to obtain a model specific to the at least one disease by means of a specialisation method; to create a vector representation of the medical data; and to group patients of the group into at least one subgroup of patients, so as to obtain, for each patient of the group, information on state of health and/or a health result, relating to the at least one disease, based on the grouping of each patient with respect to the at least one subgroup of patients.
Inventors
- DUQUESNE, Julien
- BOUGET, Vincent
Assignees
- Scienta Lab
Dates
- Publication Date
- 20260513
- Application Date
- 20240705
Claims (15)
- 1. A device (200) for grouping a group of patients into at least one patient subgroup (70) based on medical data relating to said patients of the group (10) and a predetermined predictive task, said predetermined predictive task being associated with at least one pathology with which at least one of the patients of the group is affected, the device comprising: at least one processor (202) configured to: a) receive a pre-trained foundation model (S 100), said pre-trained foundation model having been previously obtained by training, by self-supervised learning, a deep learning model, on a set of unlabeled generic data, the pre-trained foundation model being configured to, from the set of unlabeled generic data, create a generic vector representation (30); b) receiving a labeled specific data set comprising medical data of a plurality of patients, wherein said labeled specific data set is labeled according to said predetermined predictive task (S200); c) obtaining a specific model (20) for said at least one pathology by a specialization method via transfer learning (S300), said specialization method consisting of: ■ training said pre-trained foundation model on said specific dataset; or ■ adapting said pre-trained foundation model by adding an adapter to said pre-trained foundation model, said adapter consisting of a first neural network trained on said specific data set; d) receiving medical data (10) relating to the group of patients to be grouped into at least one subgroup (S400); e) creating a vector representation (60) of the medical data (10) relating to the group of patients received (S500) using an algorithm (40) derived from the specific model (20) to said at least one pathology; and f) grouping patients of the group into at least one patient subgroup (70) based on said vector representation (60), so as to obtain, for each patient of the group, information on a health state and/or a health result, relating to said at least one pathology, based on the grouping of each patient with respect to said at least one patient subgroup (S600); and at least one output configured to provide at least one of: the specific model (20), the vector representation (60) of the medical data (10) relating to the patient group, F at least one patient subgroup (70), said information on the health state and/or the health result relating to said at least one pathology.
- 2. Device (200) according to claim 1, wherein, when the medical data (10) comprises at least two different predetermined modalities, for each predetermined modality, the medical data of the predetermined modality considered are encoded separately from the medical data having another modality, the encoding being carried out by an encoding model specific to the predetermined modality considered, said encoding model specific to the predetermined modality considered having been previously trained in a self-supervised manner (S 130).
- 3. The device (200) of claim 2, wherein said encoded medical data corresponding to said at least two different predetermined modalities are merged using a modality fusion model (S 140).
- 4. Device (200) according to claim 3, wherein said modality fusion model is a second neural network with attention mechanism.
- 5. Device (200) according to any one of the preceding claims, wherein the vector representation (60) of the medical data (10) is representative of contribution information included in the medical data (10) used to group the patients into said at least one subgroup.
- 6. Device (200) according to claim 5, in which the algorithm derived from the specific model (20) for said at least one pathology is an algorithm of explainability by artificial intelligence concepts (40).
- 7. Device (200) according to any one of the preceding claims, in which the at least one processor is also configured to, after grouping the patients into at least one subgroup (S600), identify at least one piece of information representative of the interpretation of the grouping by creating a subgroup membership prediction model making it possible to predict the membership, for each of the patients in the group, of said at least one subgroup, said membership prediction model using at least one of: a statistical test, a rule model or a decision tree model, previously trained in an unsupervised manner (S700).
- 8. Device (200) according to claim 7 wherein said at least one piece of information representative of the interpretation of the grouping is at least one biomarker associated with said subgroup of patients (70).
- 9. Device (200) according to one of the preceding claims, in which each predetermined modality is one of the following: electronic health records, clinical notes, laboratory results, medical images, omics data, histopathological data.
- 10. Device (200) according to one of the preceding claims, in which the predetermined predictive task is one of the following: prediction of diagnosis of a disease, prediction of the prognosis of a patient and/or prediction of response to a treatment.
- 11. Device (200) according to one of the preceding claims, in which the at least one processor also comprises the validation of the specific model (20) for said at least one pathology.
- 12. Device (200) according to one of the preceding claims, in which the specific data set comprises the medical data of patients having been diagnosed with or suspected of having an immune-mediated inflammatory disease, and wherein said at least one pathology with which the predetermined predictive task is associated is an immune-mediated inflammatory disease.
- 13. Device (200) according to one of the preceding claims, in which when the at least one pathology is an immune-mediated inflammatory disease, the predetermined predictive task comprises the prediction of an activity score of said disease, which is representative of the prediction of the prognosis of a patient.
- 14. A computer-implemented method for grouping a group of patients into at least one patient subgroup (70) based on medical data (10) relating to said patients in the group and a predetermined predictive task, said predetermined predictive task being associated with at least one pathology with which at least one of the patients in the group is affected, said method comprising: a) receiving a pre-trained foundation model (S 100), said pre-trained foundation model having been previously obtained by training, by self-supervised learning, a deep learning model, on a set of unlabeled generic data, the pre-trained foundation model being configured to, from the set of unlabeled generic data, create a generic vector representation (30); b) receiving a labeled specific data set comprising medical data of a plurality of patients, wherein said labeled specific data set is labeled according to said predetermined predictive task (S200); c) obtaining a specific model (20) for said at least one pathology by a specialization method via transfer learning (S300), said specialization method comprising: ■ training said pre-trained foundation model on said specific labeled dataset; or ■ adapting said pre-trained foundation model by adding an adapter to said pre-trained foundation model, said adapter consisting of a first neural network trained on said specific data set; d) receiving the medical data (10) relating to the group of patients to be grouped into at least one subgroup (S400); e) creating a vector representation (60) of the medical data (10) relating to the group of patients received using an algorithm derived from the specific model (20) for said at least one pathology (S500); and f) grouping the patients of the group into at least one subgroup of patients (70) according to said vector representation (60), so as to obtain, for each patient of the group, information on a state of health and/or a health result, relating to said at least one pathology, according to the grouping of each patient with respect to said at least one subgroup of patients (S600); g) providing at least one of: the specific model (20), the vector representation (60) of the medical data (10) relating to the group of patients, the at least one subgroup of patients (70), said information on the state of health and/or the health result relating to said at least one pathology.
- 15. A computer program product comprising instructions which, when the program is executed by a computer, causes the computer to implement the method for grouping a group of patients according to claim 14.
Description
METHOD AND DEVICE FOR STRATIFYING SUBJECTS FOR DIAGNOSIS, PROGNOSIS AND RESPONSE TO TREATMENT FIELD OF THE INVENTION [0001] The present invention relates to the field of artificial intelligence used in precision medicine, for the analysis of medical data. In particular, the invention relates to the development and training of deep learning systems to make predictions of medical results such as a diagnosis or a prognosis, to group or stratify subjects, or to discover biomarkers. STATE OF THE ART [0002] Stratification or grouping of subjects/patients in the medical field consists of identifying groups of subjects with similar characteristics to improve the management and effectiveness of treatments. This grouping also makes it possible to optimize resource management and reduce costs associated with health care. [0003] Defining relevant subject groups is a recurring challenge for medical research. Artificial intelligence has provided new tools to try to solve this problem which remains a key problem, particularly in the field of immunology and autoimmune diseases. Indeed, these diseases are not yet well understood because they have very different mechanisms of action and modes of expression from one subject to another. A relevant grouping of subjects suffering from such diseases would allow a better understanding of these diseases and therefore better management. [0004] For this, some studies use unsupervised models to group subjects into subgroups, but this type of approach does not guarantee the relevance of the subgroups. groups obtained for clinical or pharmaceutical trials, since these subgroups are not linked to clinical outcomes. [0005] Other studies start from clinical results to build a prediction system and then find the predictors which made it possible to better divide the subjects into subgroups by interpreting the mechanisms of action of the prediction system. [0006] Using a predictive system to extract subgroups ensures that the subgroups are clinically relevant, but requires the use of simple algorithms with limited predictive power for two reasons. The first is that the algorithm used must be explainable to easily extract subgroups. The second is that patient/subject cohorts or clinical trials are often relatively small, especially in autoimmunity, making it impractical to use a complex predictive system that requires a lot of data to train. Thus, both of these obstacles limit the predictive power of the algorithms and, therefore, the relevance and accuracy of the derived subgroups. [0007] One of the aims of the present invention is to provide a method and a device for grouping subjects, also called stratification, into homogeneous groups, which overcome the problems previously mentioned while facilitating the diagnosis of certain pathologies, the prognosis, namely the evolution or even the outcome of diseases, or even the identification of personalized treatments and new therapeutic targets by predicting responses to these treatments. SUMMARY [0008] The invention relates to a device for grouping a group of subjects or patients into at least one subgroup of subjects from medical data relating to said subjects of the group and a predetermined predictive task, said predetermined predictive task being associated with at least one pathology with which at least one of the subjects of the group is affected. [0009] It should be noted that, within the scope of this description, the terms “patient” and “subject” are used interchangeably. Therefore, any reference to a “patient” should be understood as also including a “subject” and vice versa. [0010] According to a first embodiment, the device comprises at least one processor and at least one output. [0011] The at least one processor is configured to: receive a pre-trained foundation model, said pre-trained foundation model having been previously obtained by training, by self-supervised learning, a deep learning model, on a set of unlabeled generic data, the pre-trained foundation model being configured to, from the set of unlabeled generic data, create a generic vector representation; receive a set of labeled specific data comprising the medical data of a plurality of patients, where said set of labeled specific data is labeled according to said predetermined predictive task; obtain a model specific to said at least one pathology by a specialization method via transfer learning, said specialization method consisting of: • training said pre-trained foundation model on said specific dataset; or • adapting said pre-trained foundation model by adding an adapter to said pre-trained foundation model, said adapter consisting of a first neural network trained on said specific data set; receiving medical data relating to the group of patients to be grouped into at least one subgroup; creating a vector representation of the medical data relating to the group of patients received using an algorithm derived from the model specific to said at least one pathology