MotifHound Algorithm

1. Download

You can download a tar file here. After downloading, the file needs to be decompressed:

> tar -xvzf MotifHound_130801.tar.gz
> cd MotifHound/

2. Third party libraries

In order to work with S. cerevisiae and H. sapiens datasets, we also provide precomputed data on disorder, Pfam domains, evolution and function descriptions. These data are not required to run MotifHound but they are recommended to use it with its full potential. These data can be downloaded here and copied in the "data" directory of MotifHound.

Importantly, MotifHound uses the following programs/libraries that need to be installed:

3. Running the program

Following these installs, you can then run MotifHound with the following command (and may configure the options as you want):

> perl ./Scripts/ --Setfile ./Data/Seq/Set/YEAST_Set_TEST.faa --Proteome ./Data/Seq/Proteome/YEAST_Proteome_TEST.fasta --Size 3 10 --Scan --WD ./Results --H --Blastfile ./Data/Blast/Blast_YEAST_Proteome.blast --D --Disofile ./Data/Disorder/YEAST_Proteome_DISORDER.dat --Pfam_annot ./Data/Domains/YEAST_Proteome_Pfam_Domains.txt --Gene_annot ./Data/Genes/ --HTML

To display the help :

> perl ./Scripts/ --help

Benchmark data

We benchmarked MotifHound by creating datasets of protein sequences from S. cerevisiae, in which we spiked-in known motifs. The motifs spiked-in vary in length, number of defined positions, and number of repeats. To exhautively cover these three parameters combinations, we created 11,880 datasets. These datasets are available for download, which we hope will help in the development of future algorithms for motif discovery.