RU-2861523-C1 - METHOD FOR DETERMINING PROGNOSTIC FEATURES OF DISEASES IN ELECTRONIC MEDICAL RECORDS
Abstract
FIELD: medical informatics; analysis of medical data. SUBSTANCE: method is proposed for automatically determining prognostically significant features based on the analysis of symptoms and laboratory abnormalities extracted from electronic medical records of patients. The method can be used in the development of clinical decision support systems, in tasks of diagnosing and predicting diseases based on depersonalised medical data, including information on symptoms and laboratory test results, and also includes the following steps: obtaining a plurality of depersonalised electronic medical records of patients with the same diagnosis; converting each medical record into a binary feature vector reflecting the presence or absence of symptoms and laboratory abnormalities; training a concrete specific indirectly parameterised autoencoder using the Gumbel-Softmax mechanism, which provides stochastic but differentiable feature selection during training, the parameters of which are determined through an embedding space and a linear transformation; a predetermined number of runs of training the model with different initial weight initialisations (multi-run) with fixation of features selected by the model in each run; calculating the frequency of occurrence of each feature in the set of sets obtained as a result of the runs; selecting features whose occurrence frequency exceeds a set threshold value as stable prognostic features. EFFECT: increasing the reliability of automated processing and analysis of medical data and, as a result, increasing the accuracy of diagnosis. 3 cl, 2 dwg, 1 ex
Inventors
- Ivanisenko Vladimir Aleksandrovich
- Demenkov Pavel Sergeevich
- Ivanisenko Timofei Vladimirovich
- Gaisler Evgenii Vladimirovich
Dates
- Publication Date
- 20260505
- Application Date
- 20250930
Claims (10)
- 1. A method for determining prognostic signs of diseases based on the analysis of medical data, including the following steps:
- - obtaining depersonalized electronic medical records of patients with the same diagnosis;
- - transformation of each medical record into a binary vector of features reflecting the presence or absence of symptoms and deviations in laboratory parameters;
- - training an indirectly parameterized autoencoder of the concrete type - Concrete Autoencoder, using the Gumbel-Softmax mechanism, the parameters of which are determined through the embedding space and a linear transformation;
- - a given number of model training runs with different initial weight initializations, with the features selected by the model in each run being recorded;
- - calculation of the frequency of occurrence of each feature in the set of sets obtained from the results of the runs;
- - selection of features whose frequency of occurrence exceeds the established threshold value as stable prognostic features;
- - in this case, the number of runs is selected within the range from 50 to 200.
- 2. A method for determining prognostic features of diseases based on the analysis of medical data according to paragraph 1, characterized in that in each run a fixed number of features is selected, determined at the stage of setting up the model, as minimizing the error in restoring the original feature vector.
- 3. A method for determining prognostic signs of diseases based on the analysis of medical data according to paragraph 1, characterized in that the threshold value for the frequency of occurrence of signs is selected at a level of at least 50%.
Description
The invention relates to medical informatics and medical data analysis. The claimed method enables the automatic identification of prognostically significant features based on the analysis of symptoms and laboratory abnormalities extracted from patient electronic medical records. It can be used in the development of medical decision support systems for diagnosing and predicting diseases based on anonymized medical data, including information on symptoms and laboratory test results. In the modern healthcare system, one of the key tasks is the early detection of disease signs based on available medical data. Electronic medical records contain a significant amount of information, including clinical signs and test results, but the high heterogeneity and noise of such data hinder their effective processing and interpretation using traditional methods. Existing machine learning algorithms can identify hidden patterns, but they typically generate latent representations that lack sufficient interpretability, reducing their practical value for physicians. International practice confirms the relevance of developing interpretable models that enable direct analysis of underlying characteristics. The proposed method aims to address this issue, improving forecasting accuracy, reducing the number of diagnostic procedures, and increasing medical professionals' confidence in artificial intelligence systems. A number of solutions are known that are aimed at using artificial intelligence and machine learning technologies to analyze electronic medical records (EMRs), build mathematical models of patients, and support medical decisions. For example, a method for supporting medical decision-making using mathematical models of patient representation (RU Patent No. 2703679, published October 21, 2019) is known. It converts EMR data into medical characteristics and predicts the patient's condition. However, such characteristics are pre-defined and defined through ontologies, precluding the possibility of identifying new and secondary characteristics. A known method for generating mathematical patient models using artificial intelligence technologies (RU Patent No. 2720363, published April 29, 2020) uses ontologies to define vector representations of patients. A drawback is the lack of a mechanism for dynamically selecting prognostically significant features. A known method for early diagnosis of chronic diseases in patients is based on cluster analysis of big data (RU Patent No. 2800315, published July 20, 2023). It utilizes cluster analysis and ensemble methods to identify statistical relationships. However, this method lacks an interpretable and robust selection of specific symptoms and laboratory parameters. A system and method for automated clinical decision support ("System and Method for an Automated Clinical Decision Support System," EP Patent No. 3573068, published November 27, 2019) are known in which a denoising autoencoder is used to analyze medical data to handle incomplete data. A drawback is that the features identified are latent, lack clinical interpretability, and are not accompanied by a robustness check. A known method for predicting treatment outcomes and recommending interventions using deep learning ("Prediction of healthcare outcomes and recommendation of interventions using deep learning," US Patent No. 11915127, published February 27, 2024) involves the use of autoencoders for automated feature extraction during patient data analysis. A drawback is that latent representations are formed, and interpretable selection of initial symptoms and laboratory parameters is not implemented. A mechanism for multiple runs with frequency aggregation is also lacking. A method for supporting medical decision-making based on a hybrid diagnostic model, predicting the risk of complications, establishing a clinical diagnosis, conducting differential diagnostics, and determining patient management tactics (RU Patent No. 2828464, published October 14, 2024), has been adopted as the closest analogue. Its implementation utilizes a hybrid architecture (knowledge graphs, NLP, machine learning). However, feature detection is performed using black-box neural network models, without a mechanism for statistically validating their robustness. An analysis of the state of the art revealed that existing known solutions do not provide interpretable and robust selection of prognostically significant features directly from electronic medical records. These solutions pre-record features or generate them using models that lack a mechanism for reproducible and statistically validated selection, while autoencoders are used to construct latent representations, fill in missing data, or generate data, but not to select initial clinical features. Research using CAE, IP-CAE, and mrCAE are also not adapted to the tasks of medical record analysis and do not provide the required level of robustness and reproducibility. The task that the claimed technical