homework 6
gene46100
homework
Homework 6: Comparing Human and Neanderthal Epigenomes using Enformer
If you run the analysis locally, make sure you clone the gene46100 conda environment and install the python packages as needed. See this jupyter notebook for an example. Rendered here for your convenience.
Total Points: 100
1. Data Setup and Basic Analysis (25 points)
- Download the Enformer model and reference genome from the provided Box links (5 points)
- Choose a gene of interest and find its transcription start site (TSS) (5 points)
- Modify the notebook code to predict human epigenome at this location (5 points)
- Run Enformer on the Neanderthal sequence at the same location (5 points)
- Create scatter plots comparing human and Neanderthal predictions. Choose relevant tracks to compare (5 points)
2. Comparative Analysis (25 points)
- Calculate correlation coefficients between human and Neanderthal predictions for each track (10 points)
- Identify tracks showing the highest and lowest correlations (5 points)
- What might explain these differences in correlation? (10 points)
3. Peak Analysis (30 points)
- Identify regions where peaks are present in human but absent in Neanderthal (or vice versa) (10 points)
- For these regions:
- What cell types or marks show the most differences? (5 points)
- Are these differences consistent across both haplotypes? (5 points)
- What might be the functional significance of these differences? (10 points)
4. Technical Understanding (20 points)
- Why do we need to one-hot encode the sequences? (5 points)
- What is the purpose of the SEQUENCE_LENGTH parameter? (5 points)
- How does the model handle the two haplotypes in the Neanderthal genome? (10 points)
5. Extra Credit (10 points)
Use enformer to predict the DNA binding score from project 1. Compare the results with your original DNA binding score.