The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars
收藏DataONE2023-12-22 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:0fda7ee28320fb8fd582a45cf685a0d772e3b522d444baa7e2e163f51d72ed47
下载链接
链接失效反馈官方服务:
资源简介:
Coffea arabica, an allotetraploid hybrid of C. eugenioides and C. canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred â¼30.5 kya, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribut..., For syntenic alignments, the assemblies were aligned in CoGe platform (https://genomevolution.org/coge/) using default settings.Â
For the resequencing of 38 wild and cultivated Coffea arabica, two wild C. eugenioides, two cultivated and one wild C. canephora accessions, libraries were prepared using the KAPA HyperPrep Kits (Roche) following manufacturer's instructions, and paired-end (2 x 125) sequenced on a Illumina HiSeq2500 instrument to ~40x coverage. Additionally, Linnaean herbarium sample was sequenced to 46x coverage with Ion Torrent technology.Â
Following quality control with FastQC, Illumina short reads were trimmed using Trimmomatic v0.36 and mapped on the C. arabica reference assembly with BWA mem v0.7.16a-r1181. For the Linnaean sample, the reads were processed according to the protocols recommended for degraded DNA analysis in MapDamage v.2.0.8. GATK (v 3.8.0) pipeline was used for SNP calling. Duplicates were marked and removed using Picard v2.0.1 and genotype likelihoods ..., , # The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars
The dataset contains two items: **(i)** syntenic alignments between *C. canephora*, *C. eugenioides* and *C. arabica* assemblies, and **(ii)** the variant calls used in the population analyses in the paper.
## Description of the data and file structure
**i.** Syntenic alignments have been obtained in CoGe SynMap tool using default settings. In file names, the first two items give the CoGe IDs of the genomes being aligned: *C. canephora* - ID50947; *C. eugenioides* - ID51132; *C. arabica* subCC - ID65471; *C. arabica* subEE - ID65472; *C. arabica* - ID65463. The contents of the columns in syntenic alignments are described on row 3 of the files, and on row 1 in tandem duplicate files (which can be identified as having .*tandems.* in their names).
**ii.** The variant calls are given in VCF formatted files. Each subgenome has its own file, Arabica_sgC.TIP....
创建时间:
2023-12-23



