Introduction
Linear information encoded in proteins is the
primary component of their structure, and the modus operandi of cell
signaling and regulation machinery, including recognition and
binding, cleavage, degradation, docking, tagging, targeting,
folding, scaffolding, translation, and post-translational
modification. Linear information is organized, i.e. encoded, in
proteins as multiple independent subunits, each one responsible for
storing and processing particular information that, individually or
in cooperation, mediate proteins functions.
DALEL exhaustively searches
the linear information in proteins. First, it enumerates all
possible motifs of variable length including any number and
combination of wildcards. Then, degenerates the motifs to discover
conserved and flexible individual and correlated residues.
DALEL utilizes a novel parallel and recursive algorithm
that allows divide the exploding space of enumeration and
degeneration into much smaller spaces that can be built and searched
very fast and in parallel.
DALEL is based on the
fundamental biological premise that proteins of interest known to
have a common behaviour are enriched with the linear information
mediating that behaviour, while other proteins do not exhibit such
enrichment. Therefore, the entire space of linear information
encoded in the proteins of interest is visited and assessed for
significance by scoring their enrichment in the proteins of interest
relative to the proteome and/or the negative control proteins, by
using statistic based on the cumulative hypergeometric distribution
We applied DALEL to explore
the linear information encoded in the SH3 domain recognition
peptides in the budding yeast Saccharomyces cerevisiae. We
succeeded, using only the linear information to independently
identify the majority of experimentally determined recognition
peptides. We discovered, however, a number of peptides with distinct
properties that may serve ancillary roles. The strategy could be
applied to any recognition domain for constructing both empirical
and quantitative models of biochemical networks.
DALEL source code is available
here
Citations
Exhaustive
search of linear information encoding protein-peptide recognition
Kelil A, Dubreuil B, Levy ED, Michnick SW (2017) PLOS
Computational Biology 13(4): e1005499.
benchmark
Fast and
Accurate Discovery of Degenerate Linear Motifs in Protein Sequences
Kelil A, Dubreuil B, Levy ED, Michnick SW (2014) PLOS ONE
9(9): e106081.
benchmark
|