TF binding prediction challenge
gene46100
project
Competition details for TF binding prediction challenge
TF Binding Prediction Challenge
Overview
- Goal: Predict transcription factor (TF) binding scores in DNA sequences
- Input: 300bp human DNA sequences
- Target: Binding scores for transcription factors
Data Description
The challenge uses real genomic data to predict TF binding scores:
- Sequence Data:
chr#_sequences.txt.gz
files containing 300bp DNA sequences- Each sequence has a unique identifier in format
chr#_start_end
- Each sequence has a unique identifier in format
- Target Data:
chr#_scores.txt.gz
files containing binding scores- Each sequence has a corresponding 300-long vector of binding scores
- Scores are predicted using Homer, a widely used motif discovery tool
- Each position in the vector represents the binding score at that position in the sequence
Data prepared by Sofia Salazar.
Getting Started
Timeline
- Training Sessions:
- Tuesday, April 8: Sofia will review implementation of using the code in the basic DNA scoring model. Students will continue working on the project. Charles, Sofia, and Ran will be available to help.
- Thursday, April 10: Sofia will explain how to use weights and biases to calibrate hyperparameters of the model (learning rate, number of filters, kernel size, etc). Charles, Sofia, Ran, and Haky will be available to help.
- Presentation Day (Thursday, April 10): Students will present:
- Model architecture
- Model performance
- Filter interpretation (time permitting)
- Lessons learned
- Submission Deadline: April 18
- Submit best model to Canvas
- TA will test on held-out data
- Leaderboard will be created