This CNE set relies on a multiple sequence alignment to tetrapod and fish species using the following species (assemblies): fugu (fr3), medaka (oryLat2), stickleback (gasAcu1), tetraodon (tetNig2), lamprey (petMar1), cow (bosTau4), dog (canFam2), horse (equCab2), chicken (galGal3), human (hg19), elephant (loxAfr3), mouse (mm9), opossum (monDom5), platypus (ornAna1) and frog (xenTro2). Zebrafish is the reference species in the syntenic multiple genome alignment. Each CNE is >= 50 bp and conserved to at least two species requiring at least 65% sequence identity and an alignment entropy of >= 1.8 bits. To also make use of species without assembled genomes, we used a sensitive last  screen against DNA sequences in the NCBI trace archive and GenBank and added CNEs that align well to only one species in the genome alignment and to at least one other non-tetrapod vertebrate with 75% identity. Please see our manuscript for further details.
We annotate human/mouse conservation if a zCNE overlaps a well-aligning window to human/mouse with at least 15 bp. To more comprehensively annotate human and mouse ancestry, we also used ancestral sequence reconstruction and transitivity in addition to directly aligning sequences to identify homologies not detected in the multiple alignment.
Computationally reconstructing the likely sequence of ancestors can be used to reduce the large evolutionary distances between zebrafish and human/mouse and to uncover homologies. We used prequel (Phast package, ) for ancestral reconstruction and lastz  to align ancestral sequences.
Transitivity means that if a sequence is orthologous between species A and B as well as B and C, then it can be inferred that this sequence is also orthologous between species A and C, even if the alignment A and C was not directly detected. This is equivalent to using species B as the reference species.
UCSC Custom Tracks
All of the zCNEs will be loaded into the UCSC Genome Browser as the custom track "
Bej zCNEv1" for Zebrafish Assembly: Wellcome Trust Zv9 (danRer7, Jul/2010). The zCNEs will be colored blue and red. The red elements are those with evidence of conservation to mouse or human. Make sure you click the "Outside Link" in the track details to get more information about each CNE of interest. Ex: The details for zCNEv1_47264.
Why submit the zCNEs to GREAT: Genomic Regions Enrichment of Annotations Tool?
GREAT calculates statistical enrichments for associations between genomic regions and the functional annotations of flanking genes. This allows GREAT to generate hypotheses about the regulatory functions of the set of genomic regions. Such hypotheses can be tested by directed zebrafish experiments to reveal insights into vertebrate biology.
How are the distances to the genes defined?
The zCNE resource defines distances to gene TSS in the same away as GREAT defines them.
Why are some of the p-values I will see in GREAT be 0.0000?
Large clustering of zCNEs around many key genes causes the p-value for the observation to be smaller than the precision of the computer and thus the number becomes 0. The terms can be considered very significant.
zCNEs by Target Gene
Enter a target gene of interest to download or view in UCSC Genome Browser all CNEs that are putatively regulating that gene. Putative target genes are called using the Basal plus extension GREAT regulatory domains.
How do you associate zebfrafish genes with zCNEs?
Zebrafish genes are associated with zCNEs using GREAT regulatory domains. Please see GREAT genes and GREAT associations rules for more information on how the gene set is created and then associated with regions.
zCNEs by Region
What if a zCNE is partially in, partially out of the specified region?
To avoid confusion, we provide the BED coordinates or FASTA sequence of the full zCNE even if the zCNE only overlaps the specified region by 1 base.
When I click over to the human or mouse genome, why do I sometimes not see an alignment to zebrafish even if the zCNE indicates it is conserved to mouse or human?
We use more sensitive methods such as transitivity and ancestral reconstruction to detect conservation that the public UCSC alignment. Please see above methods section and our manuscript for details on how we improve our sensitivity.
Why might some links be broken?
We occasionally link to external resources such as ZFIN or Ensembl for more details. When external resources changes the format of the URLs, our links break. Please report broken links so that we can fix them.
These data were generated by the Bejerano Lab.
Michael Hiller and Saatvik Agarwal generated the pairwise and multiple alignments and the CNE set, Jim Notwell applied transitivity, Michael Hiller applied ancestral reconstruction and Aaron Wenger contributed tools and to the data analysis. Ravi Parikh and Harendra Guturu built the website.
If you use the zCNEs in your work, please cite:
- Michael Hiller, Saatvik Agarwal, Jim H. Notwell, Ravi Parikh, Harendra Guturu, Aaron M. Wenger, Gill Bejerano. "Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish". Nucleic Acids Res., 2013. PMID 23814184
Hubisz MJ, Pollard KS & Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 12, 41-51 (2011).
Harris RS (2007) Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University.