Regulatory sequences

3/27/2023

These models have been applied to study genetic variation in populations and generate mechanistic hypotheses for how noncoding variants associated with human disease exert their influence. In recent years, machine learning approaches to directly tackle this problem have achieved significant accuracy gains predicting transcription factor (TF) binding, chromatin features, and gene expression from input DNA sequence. Predicting the behavior of any nucleic acid sequence in any nuclear environment is a primary objective of gene regulation research. Together these techniques unleash thousands of non-human epigenetic and transcriptional profiles toward more effective investigation of how gene regulation affects human disease. We further demonstrate a novel and powerful approach to apply mouse regulatory models to analyze human genetic variants associated with molecular phenotypes and disease. Training on both genomes improves gene expression prediction accuracy on held out and variant sequences. Here, we develop a strategy to train deep convolutional neural networks simultaneously on multiple genomes and apply it to learn sequence predictors for large compendia of human and mouse data. Model organism genomes offer both additional training sequences and unique annotations describing tissue and cell states unavailable in humans. While the human genome has been extensively annotated and studied, model organisms have been less explored. Machine learning algorithms trained to predict the regulatory activity of nucleic acid sequences have revealed principles of gene regulation and guided genetic variation analysis.

0 Comments

Regulatory sequences

Leave a Reply.

Author

Archives

Categories