De novo genome assembly of human cell line CHM13 nanopore ultra-long reads using Shasta
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5068%252FD1GQ3S
下载链接
链接失效反馈官方服务:
资源简介:
Advances in Oxford Nanopore Technologies (ONT) sequencing, basecalling, and updates to Shasta are outpacing the publishing cycle. We aim to update users on the state of the art using the latest and greatest ONT data assembled with Shasta. This release encompassed our latest assembly of CHM13, the AssemblySummary.html and Assembly.gfa along with our evaluation presented in tables and figures. We assembled ultra-long nanopore reads of CHM13 using Shasta 0.9.0 with the iterative assembly mode to produce a haploid de novo genome assembly.
Methods
We downloaded publicly available reads created by the "Telomere-to-Telomere" (T2T) Consortium to assemble CHM13. For a description of sequencing methods by the T2T Consortium, please see https://github.com/marbl/CHM13#oxford-nanopore-data. Release 8 of the data was re-called with Guppy v5.0.7 in super accuracy mode.
T2T CHM13 rel8 reads
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/nanopore/rel8-guppy-5.0.7/reads.fastq.gz
We assembled the reads using Shasta 0.9.0 (Shafin et al., 2020) in the iterative assembly mode by calling the -Nanopore-Sep2020 configuration, plus additional command line options listed below. We performed the assembly on McCloud, a service that runs Shasta in the cloud.
Shasta 0.9.0 command line options
--Reads.minReadLength 50000 --Kmers.k 10 --MinHash.minHashIterationCount 100 --Align.minAlignedFraction 0.35 --Align.minAlignedMarkerCount 600 --Align.maxSkip 50 --Align.maxDrift 30 --Align.maxTrim 30 --ReadGraph.creationMethod 0 --ReadGraph.maxAlignmentCount 12 --ReadGraph.crossStrandMaxDistance 0 --MarkerGraph.refineThreshold 0 --MarkerGraph.minCoveragePerStrand 3 --MarkerGraph.simplifyMaxLength 10,100,1000,10000 --Assembly.iterative --Assembly.pruneLength 10000 --Assembly.consensusCaller Bayesian:guppy-5.0.7-a
References
Shafin,K. et al. (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol., 38, 1044–1053.
牛津纳米孔技术(Oxford Nanopore Technologies, ONT)测序、碱基识别(basecalling)以及Shasta工具的更新速度已远超学术出版周期。本数据集旨在依托Shasta组装的最新优质ONT数据,为用户介绍当前领域的前沿进展。本次发布包含我们最新组装的CHM13基因组、AssemblySummary.html与Assembly.gfa文件,以及以表格和图形呈现的评估结果。我们采用Shasta 0.9.0的迭代组装模式,对CHM13的超长纳米孔读长进行组装,得到单倍体从头(de novo)基因组组装结果。
研究方法
我们下载了“端粒到端粒(Telomere-to-Telomere, T2T)”联盟公开的读长数据,用于CHM13基因组的组装。如需了解T2T联盟的测序方法详情,请访问:https://github.com/marbl/CHM13#oxford-nanopore-data。本次发布的第8版数据已通过Guppy v5.0.7的超高精度模式完成重新碱基识别。
T2T CHM13 rel8 读长数据
https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/nanopore/rel8-guppy-5.0.7/reads.fastq.gz
我们采用Shasta 0.9.0(Shafin等, 2020)的迭代组装模式,通过调用-Nanopore-Sep2020配置参数,并附加以下额外命令行选项完成组装。本次组装在云端服务McCloud上完成,该服务可直接运行Shasta工具。
Shasta 0.9.0 命令行参数
--Reads.minReadLength 50000 --Kmers.k 10 --MinHash.minHashIterationCount 100 --Align.minAlignedFraction 0.35 --Align.minAlignedMarkerCount 600 --Align.maxSkip 50 --Align.maxDrift 30 --Align.maxTrim 30 --ReadGraph.creationMethod 0 --ReadGraph.maxAlignmentCount 12 --ReadGraph.crossStrandMaxDistance 0 --MarkerGraph.refineThreshold 0 --MarkerGraph.minCoveragePerStrand 3 --MarkerGraph.simplifyMaxLength 10,100,1000,10000 --Assembly.iterative --Assembly.pruneLength 10000 --Assembly.consensusCaller Bayesian:guppy-5.0.7-a
参考文献
Shafin,K. 等 (2020) 纳米孔测序与Shasta工具包可高效完成11个人类基因组的从头(de novo)组装. 自然·生物技术(Nat. Biotechnol.), 38, 1044–1053.
创建时间:
2022-05-26



