five

Data from: From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

收藏
DataONE2014-02-18 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
资源简介:
Background: Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have been useful for reconstructing the phylogeny of numerous clades of photosynthetic organisms (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome data. Results: We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions: Analyses of the plastid data recovered a strongly supported framework of relationships for green plants. This includes the placement of Zygnematophyceace as sister to land plants (Embryophyta) and a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining members and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees); within the monilophyte clade (Monilophyta), relationships are strongly supported with Equisetales + Psilotales sister to Marattiales + leptosporangiate ferns. We also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses and provide suggestions for future analyses that will likely incorporate plastid genome data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding protocols for the entire data set as well as subsets of the data.
创建时间:
2014-02-18
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

Visual Genome

Visual Genome contains Visual Question Answering data in a multi-choice setting. It consists of 101,174 images from MSCOCO with 1.7 million QA pairs, 17 questions per image on average. Compared to the Visual Question Answering dataset, Visual Genome represents a more balanced distribution over 6 question types: What, Where, When, Who, Why and How. The Visual Genome dataset also presents 108K images with densely annotated objects, attributes and relationships.

Papers with Code 收录

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息,包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

CE-CSL

CE-CSL数据集是由哈尔滨工程大学智能科学与工程学院创建的中文连续手语数据集,旨在解决现有数据集在复杂环境下的局限性。该数据集包含5,988个从日常生活场景中收集的连续手语视频片段,涵盖超过70种不同的复杂背景,确保了数据集的代表性和泛化能力。数据集的创建过程严格遵循实际应用导向,通过收集大量真实场景下的手语视频材料,覆盖了广泛的情境变化和环境复杂性。CE-CSL数据集主要应用于连续手语识别领域,旨在提高手语识别技术在复杂环境中的准确性和效率,促进聋人与听人社区之间的无障碍沟通。

arXiv 收录

波士顿房价数据集

波士顿房价数据集是一个经典的机器学习数据集,通常用于回归任务,尤其是房价预测。下方文档中有所有字段顺序的描述。

阿里云天池 收录

553个真实世界的半结构化访谈数据集

该数据集包含553个真实世界的半结构化访谈,每个访谈都与临床诊断结果配对,用于评估抑郁症、焦虑症和创伤后应激障碍等精神健康问题的早期诊断。数据集来自美国多个行为研究项目,参与者回答了五个关于日常生活、挑战性事件、应对策略、不愉快事件和积极经历的标准化问题。访谈被记录并转录,每个访谈的平均长度约为2,955个单词。该数据集旨在为自然语言处理模型提供真实且具有生态效度的基准,以评估其从自然语言中推断精神健康问题的能力。

arXiv 收录