 |
Introduction
Transcriptional regulation of genes is achieved by the concerted actions of
multiple transcription factors with arrays of regulatory sequences on DNA
and with each other. Consequently, traditional approaches to understand
transcription regulation have focussed on the identification and combinatorial
analysis of these cis-regulatory sequences.
With this tool, we focus on the context-dependent transcription factor binding sites (TFBSs)
interactions that may yield an explanation why the expression of genes
is modified in different directions given a particular condition. For
that purpose, we build upon the distance difference matrix (DDM)
concept from the field of structural biology where distance difference
matrices are used as a means to compare protein structures to detect
significant similarities and differences between related structures. We
introduce the concept of 'distance' between TFBS as a measure for their
degree of association and build distance matrices that summarize all
TFBS associations for both sets of promoters of differentially
regulated genes. Finally, by calculating the DDM and performing
multidimensional scaling on the resulting matrix, we can distinguish
TFBSs not contributing to the observed differential gene expression as
they will be mapped in the bulk from 'deviating' TFBSs that are likely
candidates to be responsible for the observed differential gene
expression.
|
Available matrix sets
Choosing your set of matrices:
TRANSFAC 11.3 is the commercial dataset from TRANSFAC
JASPAR is the non-redundant dataset of annotated, high quality transcription factor
binding sites (see the JASPAR website)
phyloFACTS is a dataset of matrices derived from statistically overrepresented, evolutionary conserved
regulatory region motifs from mammalian genomes
(see
Xie et al. for more information)
|
Example data sets
We provide a number of examples, described in our paper:
|
Output
It takes one hour to obtain useful results, the exact time depending upon given promoter sets, chosen PWM-library, and server load.
Links to the output of the method will be sent to your e-mail address.
The output includes:
- the 'input' files: The (validated) fasta files corresponding to the given promoter sets or the given gene names (RefSeqs or hugo ids).
- A parameter file: A file describing the parameters used to run the TFdiff job. As there are: PWM-library, match thresholds, name of the two given promoter sets, job ID and e-mail address
- TF lists: 2 files with .csv extension containing the TFs corresponding to the two groups, labeled by the PWM name, and coincided by a trend value (sum of the DDM values), p-value and q-value.
An empirical significance calculation with 1000 random DDM-MDS runs is used to obtain these lists. Ideally more randoms should be run, a standalone version can be downloaded to accomplish this or if you wish it to be done with the latest TRANSFAC version, write us an e-mail.
- Figure: The figure available is called "DDM_MDS_probabilistic.ps".
The regular DDM-MDS plot (as described in the article) does not fully account for the different Information Content of each of the Positional Weight Matrices (PWMs). Hence the distance to the origin on a PWM-independent scale is not necessarily a good representative of the over- or underrepresentation of predicted sites of a PWM. The probabilistic DDM-MDS plot is derived from the regular plot by plotting each PWM away from the origin proportional to their -log(pvalue), keeping the original angle. This fully corresponds to the TF lists (described in the above paragraph). No plot is made if there are no significant results !
|