five

Linked-read sequencing enables haplotype-resolved resequencing at population scale

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.9zw3r22bf
下载链接
链接失效反馈
官方服务:
资源简介:
The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences – including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps – are still limited by the lack of high-quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype-resolved genome resequencing at population scale, we investigated properties of linked-read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25x, 20x, 15x, 10x, 7x, and 5x) with high-coverage data (46-68x) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15x coverage, phased haplotypes span about 90% of the genome assembly, with 50 and 90 percent of phased sequences located in phase blocks longer than 1.25-4.6 Mb (N50) and 0.27-0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15x coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1Mb (N50/N90) at 25x coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher-quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase-sized genomes like birds, linked-read sequencing at moderate depth opens an affordable avenue towards haplotype-resolved genome resequencing at population scale. Methods 10X Genomics linked-reads (60x coverage) were assembled using the Supernova 2.1 assembler. To remove duplicate scaffolds of at least 99% identity from the pseudohaploid assembly, we ran the dedupe procedure in BBTools (https://sourceforge.net/projects/bbmap/) allowing up to 7,000 edits. This reduced the assembly to 11,030 scaffolds. We then aimed to ensure that all duplicate scaffolds were removed and retain only scaffolds whose integrity can be confirmed by the presence of syntenic regions in another songbird genome. To this end, we performed a lastz alignment against the collared flycatcher assembly version 1.5, which is the highest-quality assembly available from the Muscicapidae family. For this we used lastz 1.04 with settings M=254, K=4500, L=3000, Y=15000, C=2, T=2, and --matchcount=10000. This resulted in 295 scaffolds with unique hits in the flycatcher assembly.
创建时间:
2020-05-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作