AI Predicts Development of Autoimmune Diseases with New Genetic Tool
Summary: Researchers have developed a Genetic Progression Score (GPS) using artificial intelligence to predict the progression of autoimmune diseases from preclinical symptoms to full-blown disease. The GPS model combines genetic data with electronic health records to provide a personalized risk score, improving predictive accuracy by 25% to 1,000% over existing models.
This approach identifies high-risk populations early, allowing for timely intervention and better disease control. The framework can also be adapted to study other underrepresented diseases, providing breakthroughs in personalized medicine and health equity.
Important Facts:
- GPS integrates genetic studies and biobank data to refine disease progression predictions.
- The model outperformed 20 other methods, improving prediction accuracy significantly.
- Early identification of at-risk populations allows for effective intervention and treatment.
Source: Penn State
Autoimmune diseases, in which the immune system mistakenly attacks healthy cells and tissues, usually have a clinical phase before diagnosis characterized by mild symptoms or specific antibodies in the blood.
However, in some people, these symptoms may resolve before the end of the full-blown disease phase.
Knowing who is likely to progress down the disease path is important for early diagnosis and intervention, improved treatment and better disease management, according to a team led by Penn State College of Medicine researchers who have developed a new way to predict the progression of an autoimmune disease. among those with preclinical symptoms.
The team used artificial intelligence (AI) to analyze data from electronic health records and large-scale genetic studies of people with autoimmune disease to come up with a risk prediction score.
Compared to existing models, this method was between 25% and 1,000% more accurate in determining which symptoms would progress to advanced disease.
The research team published their findings this week (Jan. 2) in the journal Natural Communication.
“By targeting the most relevant population – people with a family history or early symptoms – we can use machine learning to identify patients at high risk of developing the disease and identify appropriate treatments that may slow the progression of the disease. .
“There is a lot of information that is meaningful and usable,” said Dajiang Liu, distinguished professor, vice chair for research and director of artificial intelligence and biomedical information at Penn State College of Medicine and co-author of the study.
About 8% of Americans live with an autoimmune disease, according to the National Institutes of Health, and most of them are women. The earlier you recognize the disease and intervene, the better, said Liu, because if autoimmune diseases continue, the damage can be irreversible.
There are often symptoms of the disease before a person receives a diagnosis. For example, in patients with rheumatoid arthritis, antibodies can be detected in the blood five years before symptoms start, the researchers explained.
A challenge with predicting disease progression is sample size. The number of people with a particular autoimmune disease is relatively small. With little data available, it is difficult to develop an accurate model and algorithm, Liu said.
To improve predictive accuracy, the research team developed a new method, called the Genetic Progression Score or GPS, to predict progression from preclinical stages to disease. GPS advances the concept of transfer learning — a machine learning technique in which a model is trained on one task or data set and then adapted to a different but related task or data set, explains Bibo Jiang, assistant professor of public health sciences at Penn State. College of Medicine and lead author of the study.
It allows researchers to get better information from smaller data samples.
For example, in medical imaging, artificial intelligence models can be trained to tell whether a tumor is cancerous or not. To create a training dataset, medical professionals need to label images one by one, which can be time-consuming and limited by the number of images available.
Liu said, instead, transfer learning uses more images, which are easy to label, such as cats and dogs and creates a much larger dataset. Work can also be outsourced. The model learns to distinguish between animals and, in turn, can be refined to distinguish between benign and malignant tumors.
“You don't need to train the model from scratch,” said Liu.
“The way the model separates the parts of the image to determine whether a cat or a dog is transmitted. With some practice, you can refine the model to distinguish the image of the tumor from the image of normal tissue.”
GPS is trained on data from large genome-wide association studies (GWAS), a popular method in genetic research to identify genetic differences in people with a certain autoimmune disease from those without it and to detect potential risk factors.
It also integrates data from electronic health records based on health banks, which contain rich information about patients, including genetic variants, laboratory tests and clinical diagnoses.
These data can help identify individuals in preclinical stages and show the stages of progression from preclinical to disease stage. Data from both sources are then combined to refine the GPS model, incorporating factors related to the actual development of the disease.
“Combining large case-control studies and biobanks borrows power from large sample sizes of case-control studies and improved predictive accuracy,” said Liu, explaining that people with high GPS scores are at greater risk of progressing from clinical to disease stages.
The team used real-world data from the Vanderbilt University biobank to predict the progression of rheumatoid arthritis and lupus and then validated the GPS risk scores with data from the All of Us biobank, a National Institutes of Health data initiative.
GPS better predicted disease progression than 20 other models that relied on biobank or case-control samples only and those that combined biobank and case-control samples by other methods.
Accurate prediction of disease progression using GPS can allow for early intervention, targeted monitoring and personalized treatment decisions, leading to improved patient outcomes, Liu said. It may also improve clinical trial design and recruitment by identifying populations most likely to benefit from new treatments.
Although this study focused on autoimmune conditions, the researchers say the same framework can be used to study other types of diseases.
“When we talk about underrepresented people, it's not just about race. It may also be a group of patients that is understudied in the medical literature because they comprise only a small portion of common data sets. AI and transfer learning can help us understand these statistics and help reduce health inequalities,” said Liu.
“This work demonstrates the strength of Penn State's comprehensive program of independent disease research.”
Liu and Jiang — and study co-authors Laura Carrel, professor of biology and molecular biology; Galen Foulke, associate professor of dermatology; Nancy Olsen, H. Thomas and Dorothy Willits Hallowell Chair of Rheumatology – formed the autoimmune working group and collaborated for almost ten years.
They lead innovative clinical trials, conduct research studies to understand the biological mechanisms of autoimmune diseases and develop AI methods to address various problems related to autoimmune diseases.
Chen Wang, a doctoral student in bioinformatics and genomics at Penn State, and Havell Markus, a joint graduate student in the MD/PhD Medical Scientist Training Program, are co-first authors of the study.
Other Penn State authors on the paper include: Avantika R. Diwadkar, graduate student; Chachrit Khunsriraksakul, a graduate of the MD/PhD Medical Scientist Training Program at the time of the study; and Xingyan Wang, who was a research assistant at Penn State College of Medicine at the time of the study.
Other participants include Bingshan Li, professor of molecular physiology and biophysics, and Xue Zhong, research assistant professor of genetic medicine, from Vanderbilt University School of Medicine; and Xiaowei Zhan, associate professor of public health at the University of Texas Southwestern Medical Center.
Sponsorship: Funding from the National Institutes of Health, including the National Institute of Allergy and Infectious Diseases Office of Data Science and Emerging Technologies, supported this research.
About this AI and neurology research news
Author: Christine Yu
Source: Penn State
Contact person: Christine Yu – Penn State
Image: Image posted in Neuroscience News
Actual research: Open access.
“Integrating electronic health records and GWAS summary statistics to predict autoimmune disease progression from preclinical stages” by Dajiang Liu et al. Natural Communication
Abstract
Integrating electronic health records and GWAS summary statistics to predict autoimmune disease progression from early stages
Autoimmune diseases often show an early stage before diagnosis. Electronic health records (EHR) based biobanks contain genetic data and diagnostic information, which can identify individuals at risk for progression.
Biobanks usually have small numbers of cases, which are insufficient to create accurate polygenic risk scores (PRS). Importantly, progression and case-control phenotypes may have a shared genetic basis, which we can use to improve predictive accuracy.
We propose a novel Genetic Progression Score (GPS) method that combines a biobank and a case-control study to predict the risk of disease progression. In penalized regression, GPS combines the PRS weights of case-control subjects as before and constrains the model parameters to match the prior if the former improves prediction accuracy.
In the simulations, GPS consistently yielded better predictive accuracy than other techniques that relied on biobank or case-control samples only and those that combined biobank and case-control samples.
The improvement is particularly noticeable when the biobank sample is small or the genetic correlation is low. We find PRS for progression in preclinical rheumatoid arthritis and systemic lupus erythematosus in BioVU biobank and confirm it All of us.
In both diseases, GPS achieves the highest prediction R2 and the PRS result produces the strongest correlation with the prevalence of progression.