five

Minimum virtual dataset for reproducible triploid de novo genome assembly

收藏
DataCite Commons2026-03-17 更新2026-04-25 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Minimum_virtual_dataset_for_reproducible_triploid_de_novo_genome_assembly/29979979
下载链接
链接失效反馈
官方服务:
资源简介:
Despite technological advancements, whole-genome sequencing remains technically challenging for organisms with higher-ploidy genomes. Therefore, the use of short-read sequencing platforms for this purpose has been attempted, but the conditions that result in poor-quality genomes have not been elucidated. Therefore, in the present study, simulated sequences mimicking the accumulation of insertion/deletion mutations were created to clarify the permissible differences between homologous chromosomes and the k-mer sizes for good-quality genome assembly from short-read sequencing data for triploid species. The results illustrated that a narrow range of k-mers permits the generation of high-quality assemblies for any level of difference between homologous chromosomes. This dataset consists of the virtual haploid genome (O.fasta), triploid genome data (reference_sequense), NGS read data (NGS_reads), and assembly results (all_contigs) created in this study.

尽管测序技术已取得长足进步,但针对高倍性基因组生物体的全基因组测序仍具有较高技术难度。因此,已有研究尝试采用短读长测序平台开展此类工作,但导致基因组组装质量不佳的具体条件仍未阐明。为此,本研究构建了模拟插入/缺失突变累积过程的序列,以厘清三倍体物种利用短读长测序数据实现高质量基因组组装时,同源染色体间的允许差异范围与k元组(k-mer)的最优取值区间。研究结果显示,无论同源染色体间的差异程度如何,仅需选择狭窄区间内的k元组即可获得高质量的基因组组装结果。 本数据集涵盖本研究生成的虚拟单倍体基因组(O.fasta)、三倍体基因组数据(reference_sequense)、NGS读段数据(NGS_reads)以及组装结果(all_contigs)。
提供机构:
figshare
创建时间:
2025-08-25
二维码
社区交流群
二维码
科研交流群
商业服务