hg19 syntenic ages,
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/4734606
下载链接
链接失效反馈官方服务:
资源简介:
### Fantom eRNA enhancer ages: no-exon_all_fantom_enh_hg19_age_arch_summary_matrix.bed All unique FANTOM5 eRNA across 112 tissue and cell datasets. # File generation Unique enhancers were collapsed into a single file, filtered to remove elements overlapping exons from Ensembl (ensGene_v36lift37), intersected with hg19 46-way syntenic blocks, and architectures assembled using a custom script (https://github.com/slifong08/enh_ages/blob/983356269aee3e794cc351174bdfa555eaffaa1e/age_arch/manuscript_scripts/age_arch_pipeline.py). # Bed file File columns #chr_enh start_enh end_enh enh_id sample_data seg_index core_remodeling arch mrca taxon mrca_2 taxon2 seg_index = number of age segments in enhancer. core_remodeling = 0 when simple architecture (i.e. lesser than the median number of age segments per enhancer dataset-wide) and 1 when complex architecture (i.e. greater than or equal to the number of age segments per enhancer dataset-wide). arch = "simple" or "complex" mrca = "Most Recent Common Ancestor" branch-length estimate from humans to most recent common ancestor. This was measured using the hg19 46-way vertebrate species neutral tree. taxon = most recent common ancestor taxon name. mrca_2 = most recent common ancestor branch-length estimates summarized into 10 bins for publication taxon2 = names of most recent common ancestors for 10 summarized bins. ### Roadmap ENCODE enhancer ages: no-exon_E*_enh_age_arch_summary_matrix.bed.gz Roadmap ENCODE H3K27ac and H3K4me3 ChIP-seq datasets across 98 tissues (hg19). # File generation Roadmap ENCODE datasets were downloaded from the Roadmap ENCODE website. We subtracted H3K4me3+ regions from H3K27ac+ regions for each tissue dataset using BEDTools, and overlaps >=1bp were excluded. Genomic coordinates were then filtered to remove elements overlapping exons from Ensembl (ensGene_v36lift37), intersected with hg19 46-way syntenic blocks, and architectures assembled using a custom script. (https://github.com/slifong08/enh_ages/blob/983356269aee3e794cc351174bdfa555eaffaa1e/age_arch/manuscript_scripts/age_arch_pipeline.py). E* represents the unique tissue/cell Roadmap dataset identifier (e.g. E118 corresponds to HepG2 datasets). # Bed file # File columns same as the Fantom file columns above. # Hg19 46-way vertebrate tree syntenic block age files. Information for hg19 46-way vertebrate multiple sequence alignment can be found here - http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/README.txt # Scripts generating these files (1) Creating syntenic block .bed files https://github.com/slifong08/enh_ages/blob/54084ff12e521c20cc5288d6cb3590fac93c11ed/age_arch/manuscript_scripts/get_spec_count_msa-hg19.py (2) Assigning most recent common ancestor (MRCA) and patristic distances to syntenic blocks https://github.com/slifong08/enh_ages/blob/54084ff12e521c20cc5288d6cb3590fac93c11ed/age_arch/manuscript_scripts/get_synteny_age_hg19.py ### Bed file # File columns ["chr", "start", "end", "strand", "reference_genome", "species_count", "length_synteny_block", "max_mrca", "max_patr"] # Definitions "max_mrca" = the phylogenetic distance from Homo sapiens to the most recent common ancestor of the oldest taxon. "max_patr" = the phylogenetic distance from Homo sapiens to the oldest taxon. "species_count" = the number of species' genomes the syntenic block maps to.
创建时间:
2023-06-28



