five

Data from: From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

收藏
DataONE2014-02-18 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
资源简介:
Background: Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have been useful for reconstructing the phylogeny of numerous clades of photosynthetic organisms (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome data. Results: We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions: Analyses of the plastid data recovered a strongly supported framework of relationships for green plants. This includes the placement of Zygnematophyceace as sister to land plants (Embryophyta) and a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining members and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees); within the monilophyte clade (Monilophyta), relationships are strongly supported with Equisetales + Psilotales sister to Marattiales + leptosporangiate ferns. We also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses and provide suggestions for future analyses that will likely incorporate plastid genome data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding protocols for the entire data set as well as subsets of the data.
创建时间:
2014-02-18
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

Population and Housing Census of 2007 - Ethiopia

Geographic coverage --------------------------- National coverage Analysis unit --------------------------- Household Person Housing unit Universe --------------------------- The census has counted people on dejure and defacto basis. The dejure population comprises all the persons who belong to a given area at a given time by virtue of usual residence, while under defacto approach people were counted as the residents of the place where they found. In the census, a person is said to be a usual resident of a household (and hence an area) if he/she has been residing in the household continuously for at least six months before the census day or intends to reside in the household for six months or longer. Thus, visitors are not included with the usual (dejure) population. Homeless persons were enumerated in the place where they spent the night on the enumeration day. The 2007 census counted foreign nationals who were residing in the city administration. On the other hand all Ethiopians living abroad were not counted. Kind of data --------------------------- Census/enumeration data [cen] Mode of data collection --------------------------- Face-to-face [f2f] Research instrument --------------------------- Two type sof questionnaires were used to collect census data: i) Short questionnaire ii) Long questionnaire Unlike the previous censuses, the contents of the short and long questionnaires were similar both for the urban and rural areas as well as for the entire city. But the short and the long questionnaires differ by the number of variables they contained. That is, the short questionnaire was used to collect basic data on population characteristics, such as population size, sex, age, language, ethnic group, religion, orphanhood and disability. Whereas the long questionnaire includes information on marital status, education, economic activity, migration, fertility, mortality, as well as housing stocks and conditions in addition to those questions contained in a short questionnaire.

catalog.ihsn.org 收录

HazyDet

HazyDet是由解放军工程大学等机构创建的一个大规模数据集,专门用于雾霾场景下的无人机视角物体检测。该数据集包含383,000个真实世界实例,收集自自然雾霾环境和正常场景中人工添加的雾霾效果,以模拟恶劣天气条件。数据集的创建过程结合了深度估计和大气散射模型,确保了数据的真实性和多样性。HazyDet主要应用于无人机在恶劣天气条件下的物体检测,旨在提高无人机在复杂环境中的感知能力。

arXiv 收录

OpenPose

OpenPose数据集包含人体姿态估计的相关数据,主要用于训练和评估人体姿态检测算法。数据集包括多视角的图像和视频,标注了人体关键点位置,适用于研究人体姿态识别和动作分析。

github.com 收录

THCHS-30

“THCHS30是由清华大学语音与语言技术中心(CSLT)发布的开放式汉语语音数据库。原始录音是2002年在清华大学国家重点实验室的朱晓燕教授的指导下,由王东完成的。清华大学计算机科学系智能与系统,原名“TCMSD”,意思是“清华连续普通话语音数据库”,时隔13年出版,由王东博士发起,并得到了教授的支持。朱小燕。我们希望为语音识别领域的新研究人员提供一个玩具数据库。因此,该数据库对学术用户完全免费。整个软件包包含建立中文语音识别所需的全套语音和语言资源系统。”

OpenDataLab 收录

CMNEE(Chinese Military News Event Extraction dataset)

CMNEE(Chinese Military News Event Extraction dataset)是国防科技大学、东南大学和清华大学联合构建的一个大规模的、基于文档标注的开源中文军事新闻事件抽取数据集。该数据集包含17,000份文档和29,223个事件,所有事件均基于预定义的军事领域模式人工标注,包括8种事件类型和11种论元角色。数据集构建遵循两阶段多轮次标注策略,首先通过权威网站获取军事新闻文本并预处理,然后依据触发词字典进行预标注,经领域专家审核后形成事件模式。随后,通过人工分批、迭代标注并持续修正,直至满足既定质量标准。CMNEE作为首个专注于军事领域文档级事件抽取的数据集,对推动相关研究具有显著意义。

github 收录