US-20260128173-A1 - Service to Automate the Risk Calculation of Genetic Disorders

US20260128173A1US 20260128173 A1US20260128173 A1US 20260128173A1US-20260128173-A1

Abstract

The invention provides a method for the automated calculation of the reproductive risk of genetic disorders from Next Generation Sequencing (NGS) data of male and female subjects. The method includes (i) online data collection from male and female subjects; (ii) processing raw sequencing data to detect genetic variants; (iii) assessing variant pathogenicity using a scoring metric that comprises multiple types of supporting evidence; (iv) text mining of male and female family history and phenotype description; (v) association of genetic disorders to the identified pathogenic variants based on a comprehensive database with automated updating functionalities; (vi) calculation of the reproductive risks using data from male and female subjects; (vii) digital report generation. The method enables efficient and accurate identification of pathogenic variants in a wide range of gene-disease associations, which can be scaled in a computer system.

Inventors

Ricardo Jorge Fonseca Tavares Godinho Pais
Markella Andrea Mikkelsen

Assignees

Ricardo Jorge Fonseca Tavares Godinho Pais
Markella Andrea Mikkelsen

Dates

Publication Date: 20260507
Application Date: 20241105

Claims (7)

1 ) A novel method for automating genetic disease risk assessment, comprising of: i) systematic identification of pathogenic genetic variants from genomic data of a male and a female subject; ii) based on (i), systematic pairing of pathogenic variants from a male and a female subject to calculate reproductive risk in offspring for each gene, where there is a gene-disease association; iii) for each male and female subject analyzed in (i) and (ii), matching of user-provided text, relating to the subjects' phenotype and family history, with genetic disorder names to identify relevant genes; iv) based on (i), (ii) and (iii), calculating reproductive risk for genes associated with Autosomal Recessive Mendelian inheritance; Autosomal Dominant Mendelian inheritance; X-linked Mendelian inheritance; Y-linked Mendelian inheritance, wherein the method in (1) is configured as a fully automated end-to-end process (a system as a whole), enabling the scaling of calculating reproductive genetic risk by solving the limitations of current methodologies; and wherein the method in (1) enables the application of the method to non-symptomatic individuals as a preventative genetic screening tool and deployment as a population screening tool; and wherein the method in (1) is applicable to humans and other mammalian species.
2 ) A novel method for a continuous scoring system of pathogenicity for genetic variant classification, described as MolMart Integrative Ranking Scoring (MIRS), comprising of a metric system that uses numerical weights in different scales to combine evidence of pathogenicity from: existing variant classifications in databases; inferred variant classification based on well-established publishes criteria; frequency-based observations compatible with pathogenicity; predicted impact on gene functionality; predicted loss of gene functionality from published in silico studies using diverse tools; pathogenic phenotype observations in family history. The method in (2) provides the means to discriminate between benign and pathogenic variants in a continuous scale, describing intermediate classifications such as likely benign, Variant of Uncertain Significance (VUS) and likely pathogenic. The methodology enables accurate pathogenic classification of genetic variants and is used in claim 1 ), wherein the process augments the performance of pathogenic variant identification in the field by integrating multiple sources of pathogenic supporting evidence and is applicable to pathogenic classification of genetic variants in humans and other mammalian species.
3 ) A novel method for deploying and updating a proprietary database (MolMart Gene Disease Association-MgenDA) containing gene-disease associations, comprising of: i) compact design for efficient variant annotation, containing information to facilitate pathogenicity scoring and gene-disease association; ii) based on the existing version of (i), systematic updating of MgenDA content by performing automated integrations of new database releases from the corresponding online repositories; iii) based on the existing version as an output of (i) and (ii), systematic identification of scientific publications relating to discovery of new genetic pathogenicity, by means of an Artificial Neural Network (ANN), with the purpose of integration into MgenDA iv) automated training and selection of the ANN model in step (iii) for improving the accuracy of abstract identification, based on the feedback data from the curator; The method according to claim 3 ) is used to perform steps (i) to (v) of the method in claim 1 , wherein the method in (3) improves the accuracy of database-driven genetic findings and increases the number of gene-disease or gene-trait associations over time. And wherein the method in (3), facilitates the early corporation of new findings into MgenDA before they become incorporated into mainstream databases.
4 ) A process for efficient automation of NGS data processing and analysis, in the context of reporting reproductive genetic risk, according to the method in claim 1 ) for the system as a whole, comprising of: i) automation of standard bioinformatics pipelines for NGS data processing controlled by an API as a standalone microservice; ii) dependent on (i), automated NGS quality control assessment based on multiple threshold cutoffs for sample acceptance/rejection; iii) based on the output of (i) and (ii), an optimized system for FASTQ data upload, file storage in BAM format and memory-base file reading; iv) variant annotations and risk calculations performed by the method in claim ( 1 ), using a compact fast-querying database (MgenDA), described in claim 3 ); v) an independent microservice for the automated population of genomic digital reports, dependent on the output of step (iv); wherein the process in (4) is configured to optimize performance and scalability by reducing data retrieval and processing times. And wherein the process in (4) is applicable to the classification of genomic variants in humans and mammalian species.
5 ) A novel method for facilitating genomic data collection and result interpretation online, according to the method in claims 1 ) and ( 4 ), comprising of: i) a proprietary plug-in with a trained LLM for mediating the user interaction in the context of the “Service” activation, family history data collection, genetic report generation and post-result counseling based on the input of claim 4 ) and the output of claim 1 ); ii) dependent on the input and output of (i), a bot framework with a graphical user interface in a web portal, under the proprietary name MolMart Artificial Intelligence Analyst (MAIA) that mediates the user interaction; wherein the method in claim 5 ) enables the scalability of the “Service” with an effective result interpretation follow-up, by reducing reliance on human-led steps. And wherein the method in claim 5 ) constitutes a novel application of the OpenAI framework.
6 ) Use of the method, according to any of the claims 1 to 5 for the identification and interpretation of reproductive risk in a male and a female subject or any pairwise combination of male and female subjects in humans or other mammalian species.
7 ) Use of the method, according to any of the claims 1 to 5 for the identification and interpretation of specific genetic traits in a male and a female subject or any pairwise combination of male and female subjects in humans or other mammalian species.

Description

FIELD OF THE INVENTION The present invention belongs to the field of computational methods for identifying pathogenicity in genetic variants and its utility in clinical genetics. BACKGROUND TO THE INVENTION The invention (“A Service to Automate the Risk Calculation of Genetic Disorders”, referred to thereafter as the “Service”) addresses a major healthcare challenge. According to the W.H.O., one in 100 children is born with a genetic disorder, which can be both life-limiting and life-threatening. There are over 7,000 known genetic disorders and currently 300 million people are affected by genetic disorders worldwide1. The cost of treating genetic disease reached almost $1 trillion in the US in 20192. Additionally, caring for a child with a life-limiting genetic disease comes with a financial and societal burden to the families involved, where the care costs can reach up to $3 million per individual1-4. Screening for genetic disorders at preconception has been suggested as a viable mitigation strategy for life-limiting genetic disorders5. Understanding reproductive risk early, ideally at preconception stage, empowers prospective parents to adopt a proactive approach in managing that risk, potentially leading to a long-term reduction of the healthcare burden5. However, genomic testing is currently focused on solving diagnostic cases, rather than prognostication, due to the technological constraints detailed below. As a consequence, the implementation of large-scale preventive screening programs, inclusive of ethnically diverse populations, who could benefit from access to genetic testing remain financially and technically unviable. The current state-of-the-art in genomic testing in a diagnostic setting is Next Generation Sequencing (NGS) technologies6,7. The advantage of NGS as a high-throughput DNA technology is that it enables the screening of several thousand genes in a single test and therefore has the potential to be scalable8,9. However, interpretation of NGS data for a clinical purpose is complex, labor-intensive and relies on highly-skilled individuals7. The present invention centers on delivering a cost-effective, end-to-end computational system that uses scalable functionalities to enable millions to receive accurate genetic testing faster. Our solution addresses the following limitations of the current workflows: 1) Diagnosis vs prevention. Current approaches focus on executing confirmatory diagnosis on symptomatic individuals suspected to have a genetic disease using genome-wide studies6. There are few initiatives to extend this technology to a large-scale preventive strategy in terms of determining reproductive risk8. The latter tend to focus on a limited gene panel of the most common genetic diseases and usually test prospective parents sequentially. Therefore, the approach is limited with a gap of several thousand untested genes which potentially constitute reproductive risk. Furthermore, the strategy of testing the male and female parents sequentially creates unnecessary delays and limits the testing of the second parent to one or few gene targets. Currently, there is no scalable software solution designed for systematic calculation of reproductive risk for exome- or genome-wide screening. This is a limitation in implementing a preventive strategy. 2) Reliance on highly specialized labor prevents scalability. NGS data-processing frameworks are highly dependent on bioinformaticians for running multiple software tools and visually checking the data quality and file management7. This constitutes a potential bottleneck in scaling the service. Downstream analysis of data and selection of pathogenic genetic variants requires the input of a specialized clinical scientist trained in genomics. This is a case-by-case labor-intensive approach. These specialists perform the task of selecting reportable pathogenic candidate variants by using software to filter multiple data features and visualize supporting evidence10. The selection has to follow complex and extensive American College of Medical Genetics and Genomics (ACMG) guidelines5,11,12. This constitutes another bottleneck in scaling the service. Software aiding genomic variant analysis and interpretation has resulted in improved solutions for gathering relevant data and facilitating data visualization. However, these software tools have not resulted in massively scalable genomic solutions due to dependence on manual steps. 3) High computational requirements that limit the scalability of services. Currently available software tools for Variant Calling File (VCF) annotation, such as GATK, ANNOVAR and others, are computationally demanding10,13. VCF Annotations can take up to several hours depending on the number of variants, available computational resources on CPU, memory and disk space. This is because annotation tools are designed to be generic and execute several hundred standard feature annotations per variant in text files using multiple database