AI Reveals Gene Function in Human Cells
Summary: Researchers have developed an AI model that accurately predicts gene activity in any human cell, providing insights into cellular functions and disease mechanisms.
Trained on data from more than 1.3 million cells, the model can predict gene expression in uncharacterized cell types with high accuracy. They have already revealed the mechanisms driving childhood leukemia and may help explore the “dark matter” of the genome where many cancer mutations occur.
Important Facts
- AI and Gene Work: The AI model predicts gene expression in unspecified cell types using genomic and expression data, allowing insights into cellular activity.
- Childhood Cancer Detection: The system identified how certain mutations disrupt transcription factors in inherited childhood leukemia, confirmed by lab tests.
- Exploring the Genome “Dark Matter”: The model provides tools to study non-invasive regions of the genome, shedding light on the role of unexplored gene mutations in cancer and disease.
Source: Columbia University
Using a new artificial intelligence technique, researchers at Columbia University Vagelos College of Physicians and Surgeons can accurately predict the activity of genes in any human cell, essentially revealing the cell's internal processes.
The program, described in the current issue of The environmentit could change the way scientists work to understand everything from cancer to genetic diseases.
“Predictive computer models allow revealing biological processes in a fast and accurate way. These methods can effectively perform large-scale computational experiments, improving and simplifying traditional experimental methods,” said Raul Rabadan, professor of biological sciences and lead author of the new paper.
Traditional research methods in biology are good at revealing how cells do their job or how they respond to perturbations. But they can't predict how cells work or how cells will react when they change, such as a mutation that causes cancer.
“Having the ability to accurately predict cell functions can revolutionize our understanding of important biological processes,” Rabadan said.
“It could change biology from a science that describes seemingly random processes to one that can predict the fundamental principles that govern the behavior of cells.”
In recent years, the collection of large amounts of data from cells and powerful AI models have begun to transform biology into a predictive science.
The 2024 Nobel Prize in Chemistry was awarded to researchers for their groundbreaking work in using AI to predict the structure of proteins. But the use of AI methods to predict the functions of genes and proteins inside cells has proven more difficult.
A new AI method predicts gene expression in any cell
In a new study, Rabadan and his colleagues tried to use AI to predict which genes are active within certain cells. Such information about genes can tell researchers who a cell is and how the cell performs its functions.
“Previous models have been trained on data from specific cell types, usually cancer cell lines or something else that has little in common with normal cells,” Rabadan said.
Xi Fu, a graduate student in Rabadan's lab, decided to take a different approach, training a machine learning model with gene expression data from millions of cells obtained from normal human tissues.
The input consisted of gene sequences and data showing which parts of the genome are accessible and expressed.
The whole process is similar to how ChatGPT and other popular “base” models work. These systems use a set of training data to identify basic rules, a grammar of a language, and then apply those target rules to new situations.
“Here's the exact same thing: we study the grammar in many different cellular regions, and then we go into a certain condition—either a disease or a normal cell type—and we can try to see how well we're doing. to predict the patterns of this information,” Rabadan said.
Fu and Rabadan soon enlisted a team of collaborators, including co-first authors Alejandro Buendia, now a PhD student at Stanford who was formerly in Rabadan's lab, and Shentong Mo of Carnegie Mellon, to train and test the new model.
After training with data from more than 1.3 million human cells, the system was accurate enough to predict gene expression in previously unknown cell types, producing results that closely matched experimental data.
New AI methods reveal childhood cancer drivers
Next, the researchers demonstrated the power of their AI system when they asked it to uncover the underlying biology of diseased cells, in this case, an inherited form of childhood leukemia.
“These kids inherit a mutated gene, and it wasn't really clear what these mutations were doing,” said Rabadan, who also co-directs the cancer genomics and epigenomics research program at Columbia's Herbert Irving Comprehensive Cancer Center.
With AI, the researchers predicted that the mutation disrupts the interaction between two different transcription factors that determine the fate of leukemic cells. Laboratory tests confirmed AI's predictions. Understanding the effect of these changes reveals some mechanisms that drive the disease.
AI can uncover “dark matter” in the genome
The new computational methods should also allow researchers to begin exploring the role of the genome's “dark matter”—a term borrowed from cosmology that refers to the bulk of the genome, which does not include known genes—in cancer and other diseases.
“Most of the mutations found in cancer patients are found in the so-called dark regions of the genome. This mutation does not affect the function of the protein and remains untested. said Rabadan.
“The idea is that using these models, we can look at mutations and illuminate that part of the genome.”
Already, Rabadan is collaborating with researchers at Columbia and other universities, examining types of cancer, from brain to blood cancer, learning the regulatory language of normal cells, and how cells change in the development of cancer.
This work also opens up new ways to understand many diseases beyond cancer and to identify new treatment targets. By introducing new changes in computer modeling, researchers can now gain deeper insights and predictions about how those changes affect the cell.
Following some of the recent advances in artificial intelligence in biology, Rabadan sees this work as part of a larger trend: “The new era of biology is really exciting; to turn biology into a predictive science.”
Additional Information
The paper, titled “A fundamental model of transcription in all human cell types,” was published on Jan. 8 in Nature.
Authors (all from Columbia unless noted): Xi Fu, Shentong Mo (Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE, and Carnegie Mellon University, Pittsburgh, PA), Alejandro Buendia, Anouchka P. Laurent, Anqi Shao , Maria del Mar Alvarez-Torres, Tianji Yu, Jimin Tan (New York University Grossman School of Medicine, New York, NY), Jiayu Su, Romella Sagatelian, Adolfo A. Ferrando (Columbia and Regeneron, Tarrytown, NY), Alberto Ciccia, Yanyan Lan (Tsinghua University, Beijing, China), David M. Owens Teresa Palomero, Eric P. Xing (Mohamed bin Zayed University of Artificial Intelligence and Carnegie Mellon University), and Raul Rabadan.
About this AI and genetic research news
Author: Helen Garey
Source: Columbia University
Contact person: Helen Garey – Columbia University
Image: Image posted in Neuroscience News
Actual research: Open access.
“A basic model of transcription in all human cell types” by Raul Rabadan et al. The environment
Abstract
A basic model of transcription in all human cell types
Transcriptional regulation, which involves complex interactions between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack the generalizability to accurately predict the types and conditions of undifferentiated cells.
Here we present GET (general expression transformer), an interpretable base model designed to reveal regulatory genes in all 213 types of human fetal and adult cells.
Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously undetected cell types.
GET also shows remarkable adaptability to all new sequencing platforms and assays, allows targeting across a wide range of cell types and conditions, and reveals cell type-specific transcription factor interaction networks.
We tested its performance in predicting regulatory activity, predicting regulatory factors and regulators, and identifying physical interactions between transcription factors and found that it outperforms current models in predicting readouts in a lentivirus-based reporter assay.
In fetal erythroblasts, we identified remote regulatory regions (larger than 1 Mbp) that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor interaction that explains the functional significance of leukemia risk. viral transformation.
Overall, we provide a general and accurate model of transcription and catalogs gene regulation and transcription factor interactions, all with cell type specificity.