Striking variation in chromosome structure within Musa acuminata and its diploid cultivars

Mendeley Data2024-04-26 更新2024-06-27 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.44j0zpcnq

下载链接

链接失效反馈

官方服务：

资源简介：

# SNP datasets (vcf files) used for in silico painting of Mchare x M. acuminata 'Calcutta 4' F1 hybrid clones [https://doi.org/10.5061/dryad.44j0zpcnq](https://doi.org/10.5061/dryad.44j0zpcnq) Genomic DNA was isolated with the NucleoSpin PlantII kit (Macherey-Nagel, Düren, Germany) according to the manufacturer’s recommendations and further sheared by Bioruptor Plus (Diagenode, Liege, Belgium) to achieve an insert size of about 500 bp. Libraries for sequencing were prepared from 2 μg of fragmented DNA using TruSeq® DNA PCR-free kit (Illumina) and sequenced on a NovaSeq 6000 (Illumina), producing 2 × 150-bp paired-end reads to achieve a minimal sequence depth of 25 ×. Raw data were trimmed for low-quality bases and adapter sequences and to the same length using fastp v.0.20.1 (Chen et al., 2018). Analysis of proportion of individual parental subgenomes in the F1 hybrid clones was done using vcfHunter pipeline ([https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter) according to Baurens et al. (2019). Briefly, trimmed reads were aligned to reference genome sequence of M. acuminata ssp. malaccensis ‘DH Pahang’ v4 (Belser et al., 2021) by BWA-MEM v0.7.15 (Li 2013), followed by removing redundant reads using MarkDuplicate from Picard Tools v2.7.0, and locally realigned around indels using the IndelRealigner tool of GATK v3.3 package (McKenna et al., 2010). Bases with a mapping quality ≥10 were counted using the process_reseq_1.0.py python script ([https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter). Variant calling and SNP filtering steps were performed according to Baurens et al. (2019) using the VcfPreFilter.1.0 python script (alleles supported by at least three reads and with a frequency 0.25 were kept as variant) and vcfFilter.1.0.py python script (<6-fold coverage for the minor allele were converted to missing data) ([https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter). Finally, proportion of parental genomes in the F1 hybrid clones along the individual chromosomes of the reference genome sequence was called using biallelic SNPs (SNPs specific to Mchare cultivars and M. acuminata spp. burmannicoides ‘Calcutta 4’) in CDS genome regions using vcf2allPropAndCov.py and vcf2allPropAndCovByChr.py python scripts ([https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter) according to Baurens et al. (2019). ## Description of the data and file structure Genome proportion of eight F1 hybrid clones was analyzed: | **Accession name of F1 hybrid** | **Male parent** | **Female parent (Mchare clone)** | | :------------------------------ | :-------------------------- | :------------------------------- | | ‘NM275\_4’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ | | ‘NM258\_3’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ | | ‘NM209\_3’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Laini’ | | ‘NM237\_8’ | Musa acuminata ‘Calcutta 4’ | ‘Ijihu Inkudu’ | | ‘T2269\_1’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ | | ‘T2274\_6’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ | | ‘T2274\_9’ | Musa acuminata ‘Calcutta 4’ | ‘Huti White’ | | ‘T2619\_15’ | Musa acuminata ‘Calcutta 4’ | ‘Mchare Mlelembo’ | The vcf files contain biallelic SNPs specific to male (M. acuminata 'Calcutta4'; 2n = 2x = 22 ) and female (Mchare cultivars; 2n = 2x = 22) parents,whcih were used to analyze contribution of parental genomes in F1hybrids. Analysis was done using vcfHunter pipeline ([https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter) according to Baurens et al. (2019). ## Sharing/Access information This is a section for linking to other ways to access the data, and for linking to sources the data is derived from, if any. Data was derived from the following sources: * Raw Illumina sequences of the parents and F1 hybrid clones are stored in the NCBI sequence Read Archive (SRA): SRA experiments: SRX22339926 - SRX22339938.

# 单核苷酸多态性（Single Nucleotide Polymorphism, SNP）数据集（vcf格式文件）：用于Mchare与小果野蕉（Musa acuminata）'Calcutta 4'的F1杂交克隆的虚拟基因组分型分析 [https://doi.org/10.5061/dryad.44j0zpcnq](https://doi.org/10.5061/dryad.44j0zpcnq) 基因组DNA采用NucleoSpin PlantII试剂盒（Macherey-Nagel，德国迪伦）依照制造商说明书提取，随后使用Bioruptor Plus超声破碎仪（Diagenode，比利时列日）将DNA片段化至约500 bp的插入片段长度。测序文库以2 μg片段化DNA为起始材料，采用TruSeq® DNA无PCR试剂盒（Illumina）构建，随后在NovaSeq 6000测序平台（Illumina）上进行2×150 bp双端测序，确保最低测序深度达到25×。原始测序数据使用fastp v.0.20.1（Chen等，2018）进行质控修剪，去除低质量碱基、接头序列并统一序列长度。 F1杂交克隆的亲本亚基因组比例分析采用vcfHunter流程（[https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter)），参照Baurens等（2019）的方法完成。简要步骤如下：将修剪后的reads比对至小果野蕉亚种malaccensis 'DH Pahang' v4参考基因组（Belser等，2021），使用BWA-MEM v0.7.15（Li，2013）完成比对；随后使用Picard Tools v2.7.0的MarkDuplicate工具去除冗余reads，并使用GATK v3.3套件的IndelRealigner工具在插入缺失（indel）区域进行局部重比对。使用process_reseq_1.0.py Python脚本统计比对质量≥10的碱基数目，该脚本来源为[https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter)。变异检测与SNP过滤步骤参照Baurens等（2019）的方法，使用VcfPreFilter.1.0 Python脚本（保留至少3条读长支持、等位基因频率≥0.25的变异位点）与vcfFilter.1.0.py Python脚本（将次要等位基因覆盖度<6倍的位点转换为缺失数据），上述脚本均来自[https://github.com/SouthGreenPlatform/vcfHunter](https://github.com/SouthGreenPlatform/vcfHunter)。最后，参照Baurens等（2019）的方法，使用vcf2allPropAndCov.py与vcf2allPropAndCovByChr.py Python脚本（来源同上），针对编码区（CDS）内的双等位SNP（特异性区分Mchare品种与小果野蕉亚种burmannicoides 'Calcutta 4'的SNP），统计F1杂交克隆的亲本基因组在参考基因组各染色体上的比例。 ## 数据及文件结构说明本研究分析了8个F1杂交克隆的基因组比例，具体信息如下表： | **F1杂交克隆登录名** | **父本** | **母本（Mchare克隆）** | | :------------------------------ | :-------------------------- | :------------------------------- | | ‘NM275_4’ | 小果野蕉‘Calcutta 4’（Musa acuminata ‘Calcutta 4’） | ‘Mchare Laini’ | | ‘NM258_3’ | 小果野蕉‘Calcutta 4’ | ‘Mchare Laini’ | | ‘NM209_3’ | 小果野蕉‘Calcutta 4’ | ‘Mchare Laini’ | | ‘NM237_8’ | 小果野蕉‘Calcutta 4’ | ‘Ijihu Inkudu’ | | ‘T2269_1’ | 小果野蕉‘Calcutta 4’ | ‘Huti White’ | | ‘T2274_6’ | 小果野蕉‘Calcutta 4’ | ‘Huti White’ | | ‘T2274_9’ | 小果野蕉‘Calcutta 4’ | ‘Huti White’ | | ‘T2619_15’ | 小果野蕉‘Calcutta 4’ | ‘Mchare Mlelembo’ | vcf文件包含区分父本（小果野蕉'Calcutta 4'；2n=2x=22）与母本（Mchare品种；2n=2x=22）的双等位SNP，用于分析F1杂种的亲本基因组贡献比例。分析流程仍采用vcfHunter流程，参照Baurens等（2019）的方法完成。 ## 共享与获取信息本节提供数据的其他获取途径及数据来源链接。本数据集的原始数据来源如下： * 亲本及F1杂交克隆的Illumina原始测序数据存储于NCBI序列读取归档库（Sequence Read Archive, SRA）：SRA实验编号：SRX22339926 - SRX22339938。

创建时间：

2024-04-22

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集聚焦于香蕉属（Musa acuminata）及其二倍体品种的染色体结构变异研究，通过比较寡核苷酸涂色和Illumina重测序技术，分析了野生亚种和栽培品种中的大规模染色体重排现象，并揭示了F1杂交后代中亲本基因组的贡献比例。数据集包含8个F1杂交克隆的VCF文件，用于基于SNP分析亲本基因组比例，旨在支持香蕉育种和基因组学研究。

以上内容由遇见数据集搜集并总结生成