five

Modified LRphase and Simulated Dataset for "LRphase: an efficient algorithm for assigning haplotypic identity to long reads"

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7823581
下载链接
链接失效反馈
官方服务:
资源简介:
A modified version of LRphase was used to simulate reads for a single hypothetical human genome with maternal and paternal phasing information. Briefly, haplotype-specific reference sequences were generated with `bcftools consensus` (Li 2011) based on the rescued GIAB VCF and hg38 human reference sequence. Each haplotype-specific fasta was fed separately into pbsim2 (Ono, Asai, and Hamada 2021) as the reference from which simulated reads were randomly drawn, up to 1X coverage. Parameters controlling the read length distribution, sequencing, and base calling error rates were set to emulate typical performance of the MinIon sequencing platform with flow cell version R10.4.1 (https://nanoporetech.com/products/minion). These are as follows: `--depth 1 –hmm_model R103.model --difference-ratio '23:31:46' --length-mean 25000 --length-min 100 --length-max 1000000 –length-sd 20000 --accuracy-mean 0.98 --accuracy-min 0.01 --accuracy-max 1.00`. Simulated reads were aligned to the hg38 reference genome with minimap2 (Li 2018) and correct phasing and alignment coordinates were encoded in the read names. Finally, samtools (Li et al. 2009) was used to remove duplicated and supplementary reads, and concatenate, sort, and index reads into a single combined bam file. Of 258,539 total reads, 246,210 were mappable, and 178,504 overlapped at least one heterozygous variant in HG001.
创建时间:
2023-04-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作