five

Post-Processed DNA Methylation Data

收藏
DataCite Commons2021-11-10 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Post-Processed_DNA_Methylation_Data/16983499
下载链接
链接失效反馈
官方服务:
资源简介:
Processed data for well-covered CpG triplets WGBS data from 121 experimental replicates representing 77 unique biological samples, publicly available from ENCODE. These replicates include clinical tissue samples, cell lines, and primary cells. We selected replicates using single-end reads which were not flagged by ENCODE as having low coverage or insufficient read length. The list of the sample identifiers is found in ENCODE_IDs.xlsx. Each replicate is associated with a BAM file generated by mapping reads to GRCh38 using Bismark. We used the MethPipe methylation software suite to convert BAM files into MethPipe format and generate epiread files, an efficient format reporting the genomic index and methylation status of each CpG contained in a read. Within each biological replicate, we used the epireads file generated to extract data from ``well-covered'' triplets. After discarding reads which reported a CpG with ambiguous methylation status, we considered a triplet well-covered within a replicate if all three CpGs are jointly covered by at least $100$ reads from that sequencing run. Across all 121 replicates, we found $650,152$ well-covered triplets, representing $75,212$ unique loci total. For each autosome in each sample we estimated the exchangeable weight of each well-covered triplet and corrected for estimator bias using a sample mean of N=1000 full bootstrap resamples. Although each estimate of a triplet's exchangeable weight $\hat\lambda$ lies in $[0,1]$, the bias-adjusted estimate $(\hat\lambda-\bar{\hat\lambda^*})$ may be larger than $1$ or smaller than $0$. Therefore we truncate these estimates to $[0,1]$. The tsv files here contain processed triplets corresponding to each BAM file ID. Each row corresponds to a well-covered triplet, with columns corresponding to 1) chromosome number, 2) index of the first CpG in the triplet on the chromosome, 3) position of the first CpG in the triplet on the chromosome, 4) position of the nearest transcription start site, 5) distance between the triplet centroid and nearest transcription start site, 6) an estimate of the total variation distance to the class of exchangeable distributions, 7) an estimate of the exchangeable weight of the triplet (bias-corrected), 8) a bootstrap estimate of the standard deviation of $\hat\lambda$, 9-16) the counts of each of the $8$ possible triplet configurations (ordered lexicographically, i.e. `000', `001', etc.), and 17-24) an estimate of the largest exchangeable component<br>
提供机构:
figshare
创建时间:
2021-11-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作