American researchers have developed a new artificial intelligence model that, after being trained with a large amount of data, can accurately predict the gene expression inside various human cells, which will bring convenience to biological and medical research. This model, called the Universal Expression Transformer (GET), was jointly developed by researchers from institutions such as Columbia University and Carnegie Mellon University in the United States. Its accuracy and effectiveness have been experimentally validated, and the paper has been published in the latest issue of the British journal Nature. During gene expression, the "blueprint" of genes stored in DNA is transcribed into an "transcript" in RNA form, which guides the synthesis of proteins and the execution of specific physiological functions. There are a wide variety of biomolecules involved in transcriptional regulation, and their interactions are extremely complex. Previous prediction models were limited to a few specific cells, especially cancer cells, and lacked universal tools applicable to multiple cell types in the human body. Researchers designed machine learning models based on the characteristics of transcriptional regulation mechanisms, and then trained them using gene sequencing and expression data from 13000 human cells. These cells cover 213 types of human embryonic cells and adult cells, all of which come from normal human tissues without lesions. Just like artificial intelligence tools such as ChatGPT can summarize general grammar rules based on a large amount of corpus, GET models can also summarize the "grammar" of transcriptional regulation from training data, and based on this, can predict gene expression in cell types that they have not been exposed to before. This model can be used to reveal the mechanism of action of pathogenic genes and guide research on cancer and genetic diseases. For example, a patient with a certain type of childhood leukemia carries a variant gene with unknown function. The GET model predicts that this gene will disrupt the interaction between two transcription factors in the cell, and experimental data confirms this conclusion. Researchers say that the model can also be used to explore the role of "dark matter" in the genome. The protein coding gene sequence only accounts for a small part of the human genome, with 98% of non coding regions resembling dark matter in the universe, whose properties and functions are currently elusive. (New Society)
Edit:He Chuanning Responsible editor:Su Suiyue
Source:Economic Information Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com