Predicition

Methods for predicting TFs, transcription cofactors

Transcription factors (TFs) are key regulators through binding to specific DNA sequence to activate or repress gene expression. Each TF has at least one DNA-binding domain (DBD) which is conserved in evolution. Based on their DBDs, TFs could be classified into different families. After reviewing literatures, we finally collected and curated 72 animal TF families and a group of them named "Others" which includes some orphan TFs. We identified TFs based on the Hidden Markov Model (HMM) profiles of their DBDs. Among the 72 defined families, 59 families had HMM profiles of their DBDs in Pfam database (v31.0) and we downloaded them directly. For the remaining domains without available Pfam HMM profiles, we rebuilt the HMM profiles using the sequences in representative species (human, mouse, zebrafish and fly). To build the HMM profiles for them, we performed multiple sequence alignment by ClustalW2 for their DBD sequences and used the hmmbuild program in HMMER 3.1b2 package to build HMM profiles. Then, we applied the hmmsearch program to search all the protein sequences in each species against the HMM profiles to predict TFs. Based on our manual curation, we took the different E-value cutoff for different TF families. In addition to the predicted TFs, we also found some TFs reported in publications. But none of them can be classified into one TF family, so we classified them into group "Others".

Transcription cofactors are considered as proteins that interact with TFs in the transcription complex to active or repress the transcription of genes. To identify them, we firstly got the human transcription cofactors from Tcof-DB v2 database and GO database according to the GO items: "transcription coactivator activity", "transcription corepressor activity", "transcription cofactor activity" and "regulation of transcription". In addition, if the gene has one of the following GO annotations: "chromatin remodeling", "chromatin-mediated maintenance of transcription", "histone ylation", "histone .ylase activity", "histone *transferase activity", we think it is a chromatin remodeling factors. In AnimalTFDB3.0, the chromatin remodeling factors were also merged into transcriptional cofactors. After removing redundant genes and the overlap with TFs, we got 1025 transcription cofactors in human. Based on their function, transcriptional cofactors were classified into five major categories. Genes in the "Co-activator/repressors" category are genes annotated as a coactivator or corepressor; "Histone-modifying Enzymes" are genes that can encode histone modification enzymes; "Chromatin Remodeling Factors" genes were collected according to the description of the GO annotations mentioned above, excluding the histone modification enzyme; Genes in "General Cofactors" category are transcriptional cofactors involed in initiation or elongation process of transcription; "Cell Cycle" genes are cell cycle associated transcription cofactors; Cofactors dose not belong to the above categories are classified as "Other Cofactors".

In order to identify transcription cofactor in other 182 species, we do the reciprocal best-hit BLAST between the human and other species with the conditions setting as e-value<=1e-4, coverage>=50%, identity>=30%.