Dataset of assembly, annotation, and variant call files for the chromosome-scale haplotype-resolved genome of Vitis sp. 'Zhuosexiang'
收藏DataCite Commons2026-04-17 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=a863e020143a4bf1b19af3d7fd369f42
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents a high-quality, chromosome-scale, haplotype-resolved genome assembly for the grape variety Vitis sp. 'Zhuosexiang', comprising two fully phased haplotypes (Hap1 and Hap2) generated through the integration of PacBio HiFi, Oxford Nanopore Technologies (ONT) ultra-long reads, and Hi-C chromatin interaction data, where the assemblies were constructed using Hifiasm (v0.19.5) for primary diploid assembly, scaffolded into chromosomes via 3D-DNA (v190716) and manual curation in Juicebox (v1.11.08), and refined using AssemblyMapper (v1.0.3) based on 19 pseudo-chromosomes followed by gap-filling with ONT reads to produce final gap-free sequences totaling 520.9 Mb (Hap1) and 518.5 Mb (Hap2) with high contiguity (Scaffold N50: 25.0 Mb and 25.8 Mb, respectively). Repetitive elements were annotated using a de novo library built with RepeatModeler (v2.0.1) and identified via RepeatMasker (v4.0.7) based on Dfam (v3.2) and RepBase (20181026), revealing repetitive sequences constituted 54.62% and 54.73% of the respective genomes. Protein-coding genes were annotated using a hybrid pipeline combining de novo prediction (BRAKER v2.1.6 trained on Arabidopsis thaliana), homology-based prediction (GenomeThreader v1.7.3 using Vitis vinifera 'PN40024' proteins), and transcriptome evidence (PASA v2.5.0 using Iso-Seq reads), which were merged using EvidenceModeler (v1.1.1) to yield high-confidence gene sets of 34,240 and 34,062 genes for Hap1 and Hap2, respectively. Functional annotation was performed using DIAMOND (v2.0.4.142) against the NR, Swiss-Prot, and UniRef90 databases (E-value < 1e-3), supplemented by Pfam domain identification and eggNOG-mapper (v2.0.0) for GO and KEGG assignments, while non-coding RNAs were identified by comparing against the Rfam (v14.1) database using Infernal (v1.1.4), identifying 2,780 and 2,075 non-coding RNA genes in Hap1 and Hap2, respectively. Genomic divergence between haplotypes was characterized by aligning Hap2 against Hap1 using minimap2 (v2.17) with the -ax asm5 preset and --eqx option, followed by processing with SAMtools (v1.10) and analysis with SyRI (v1.4) to identify structural variations and SNPs/InDels, confirming high collinearity between the two haplotypes.
提供机构:
Science Data Bank
创建时间:
2026-04-17



