The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

NIAID Data Ecosystem2026-05-01 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.qnk98sfpt

下载链接

链接失效反馈

官方服务：

资源简介：

Coffea arabica, an allotetraploid hybrid of C. eugenioides and C. canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ∼30.5 kya, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica. Methods For syntenic alignments, the assemblies were aligned in CoGe platform (https://genomevolution.org/coge/) using default settings. For the resequencing of 38 wild and cultivated Coffea arabica, two wild C. eugenioides, two cultivated and one wild C. canephora accessions, libraries were prepared using the KAPA HyperPrep Kits (Roche) following manufacturer's instructions, and paired-end (2 x 125) sequenced on a Illumina HiSeq2500 instrument to ~40x coverage. Additionally, Linnaean herbarium sample was sequenced to 46x coverage with Ion Torrent technology. Following quality control with FastQC, Illumina short reads were trimmed using Trimmomatic v0.36 and mapped on the C. arabica reference assembly with BWA mem v0.7.16a-r1181. For the Linnaean sample, the reads were processed according to the protocols recommended for degraded DNA analysis in MapDamage v.2.0.8. GATK (v 3.8.0) pipeline was used for SNP calling. Duplicates were marked and removed using Picard v2.0.1 and genotype likelihoods were called into GVCF files using HaplotypeCaller (GATK). For the diploid progenitors, to allow interspecies comparisons, the mapping was done to each of the subgenomes separately, including chromosome zero, i.e., contigs not assembled into pseudomolecules, in both mappings. Joint calling was carried out using GenotypeGVCFs (GATK) and snpEff v4.3t was used to assess the impact of the SNPs. To remove regions with cross-species mappings, we removed the SNPs that were called as heterozygous when mapping the di-haploid ET39 sequencing data to the Arabica reference genome.

创建时间：

2023-12-22