Chromosome-level genome assembly of Triticum turgidum var 'Kronos'

Mendeley Data2024-05-11 更新2024-06-27 收录

下载链接：

https://zenodo.org/records/11106422

下载链接

链接失效反馈

官方服务：

资源简介：

This data is made available under the Toronto Agreement. All of the data listed here is available under the prepublication data sharing principle of the Toronto agreement (1). By using this data, you agree to: respect the rights of the data producers and contributors to analyze and publish the first global analyses and certain other reserved analyses of this data set in a peer-reviewed publication. not redistribute, release, or otherwise provide access to the data to anyone outside of the group, until the data has been published & submitted to the public data repositories. contact the authors to discuss any plans to publish data or analyses that utilize this data to avoid the overlap of any planned analyses. fully cite the prepublication data along with any applicable versioning details. understand that this data as accessed is precompetitive and is not patentable in its present state. This agreement does not expire by time but only upon publication of the first global analysis by the data producers and contributors. (1) Toronto International Data Release Workshop Authors. Prepublication data sharing. Nature 461, 168–170 (2009). https://doi.org/10.1038/461168a If you have questions about the use of this dataset, please contact Ksenia Krasileva: kseniak [at] berkeley.edu Summary of the datasets We produced 526 Gbp of high-fidelity (HiFi) reads for Kronos. As Kronos typically self-pollinates in the field and its residual heterozygosity is low, these reads were assembled with hifiasm v0.19.5-r587 (-l0) to produce haplotype-collapsed assembly. Primary and associated contigs were concatenated into a single file. These contigs are in the files with the prefix 'Kronos.contigs'. The concatenated primary and associated contigs were further scaffolded with chromosome conformation capture sequencing (Hi-C) data. We used yahs v1.2a.2. The resulting 14 largest scaffolds were greater than 600 Mbp in size, representing 14 chromosomes (7 x AB). These scaffolds were renamed based on the similarity to the bread wheat reference genome from the IWGSC. After plasmid genomes were separated, the rest of the contigs or scaffolds, which were all smaller than 4 Mbp, were concatenated into a single sequenced named 'Un' (for unplaced). These sequences can be found in the files with the prefix 'Kronos.collapsed'. Updates in Zenodo v2 In the genome version 1.1, the following chromosomes are reversed and complemented: 1B, 2A, 2B, 3A, 3B, 5A, 6A and 6B. This adjustment was made to ensure the alignment (orientation) of the chromosome arms remains consistent with that of the bread wheat reference genome. The gene models were initially generated using BRAKER, GINGER, and Funannotate, all of which utilized protein evidence, transcript evidence generated from paired-end RNA-seq data, and independently trained ab initio predictors. Consensus annotations were derived using EvidenceModeler by merging the gene models from the three predictions, transcripts assembled with PASA, and protein sequences from closely related species aligned with miniprot. PASA was also employed to update alternative transcripts and untranslated regions. The high-confidence set (Kronos.v1.0.high) comprises 69,808 genes. These gene models have start and stop codons and have homologs in public databases with 97% or more bidirectional coverages. The low-confidence set (Kronos.v1.0.low) has 44,381 genes, including putative pseudogenes and gene fragments. Some of the genes are partially annotated, and we are in process of improving the annotations. Please use the genome version v1.1 for this annotation set. Acknowledgement This work has been funded by the United States Department of Agriculture - National Institute for Food and Agriculture Award (2021-67013-35726).

本数据集依据《多伦多协定》（Toronto Agreement）发布。此处列出的全部数据均遵循该协定的预印本数据共享原则(1)。使用本数据集即表示您同意：尊重数据生产者与贡献者的权益，在同行评议期刊中优先发表针对本数据集的首次全球分析及部分其他专属分析；在本数据集正式发表并提交至公共数据仓储前，不得向团队外人员重新分发、发布或提供本数据集的访问权限；联系作者沟通任何拟基于本数据集发表数据或分析的计划，以避免分析计划出现重复；完整引用该预印本数据及所有适用的版本信息；知晓本数据集当前处于预竞争状态，现阶段无法申请专利。本协定无时间限制，仅在数据生产者与贡献者完成首次全球分析并发表后终止。(1) 多伦多国际数据发布研讨会作者团队. 预印本数据共享. 《自然》(Nature), 461, 168–170 (2009). https://doi.org/10.1038/461168a 若您对本数据集的使用存在疑问，请联系Ksenia Krasileva：kseniak [at] berkeley.edu 数据集摘要我们为Kronos生成了526 Gbp的高保真（HiFi）测序读段。由于Kronos在田间通常为自花授粉，且残余杂合度较低，我们使用hifiasm v0.19.5-r587 (-l0)对这些读段进行组装，得到单倍型折叠组装结果。将主要重叠群与相关重叠群拼接为单个文件，这些重叠群存放在前缀为'Kronos.contigs'的文件中。对拼接后的主要与相关重叠群，我们进一步利用染色体构象捕获测序（Hi-C）数据进行支架搭建，使用工具为yahs v1.2a.2。最终得到的14个最大支架的长度均超过600 Mbp，对应14条染色体（7套AB基因组）。我们参照国际小麦基因组测序联盟（IWGSC）的普通面包小麦参考基因组，对这些支架进行重命名。分离质粒基因组后，其余所有长度小于4 Mbp的重叠群或支架被拼接为单个序列，命名为'Un'（表示未定位序列），这些序列存放在前缀为'Kronos.collapsed'的文件中。 Zenodo版本v2更新在基因组版本1.1中，以下染色体进行了反向与互补处理：1B、2A、2B、3A、3B、5A、6A及6B。此次调整旨在确保染色体臂的比对方向与普通面包小麦参考基因组保持一致。基因模型最初通过BRAKER、GINGER与Funannotate生成，这三款工具均整合了蛋白质组证据、由双端RNA-seq数据得到的转录组证据，以及独立训练的从头预测工具。我们通过EvidenceModeler合并三款工具预测的基因模型、经PASA组装的转录本，以及由近缘物种蛋白质序列经miniprot比对得到的结果，得到一致性注释。PASA还被用于更新可变剪接转录本与非翻译区。高置信度基因集（Kronos.v1.0.high）包含69,808个基因，这些基因模型具备起始密码子与终止密码子，且在公共数据库中存在同源序列，双向覆盖度≥97%。低置信度基因集（Kronos.v1.0.low）包含44,381个基因，其中包括推定的假基因与基因片段。部分基因仅完成部分注释，我们正持续优化注释信息。请使用基因组版本v1.1对应此注释集。致谢本研究受美国农业部-国家食品与农业研究所项目（2021-67013-35726）资助。

创建时间：

2024-05-10

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集包含Triticum turgidum var 'Kronos'的染色体级别基因组组装，通过高保真度读取和Hi-C数据支架技术生成，提供了高置信度和低置信度的基因模型及详细注释。数据使用需遵守多伦多协议的规定。

以上内容由遇见数据集搜集并总结生成