evotext/ielex-data-and-tree: IELex data and tree (2021/11/08)

Mendeley Data2024-03-27 更新2024-06-27 收录

下载链接：

https://zenodo.org/record/5556801

下载链接

链接失效反馈

官方服务：

资源简介：

Data and scripts for producing a Baysian phylogenetic tree sample of for the Indo-European family which is "good enough", using data from IELex (Dunn et al. 2011). This repository contains data and scripts for producing a Baysian phylogenetic tree sample of for the Indo-European family which is good enough for use in phylogenetic comparative methods. It should be considered as the current version of the "Indo-European Lexical Cognacy Database" from Dunn et al. (2011), popularly known as "IELex". The "Indo-European Cognate Relationships" (IE-CoR) database project, now based at the Max Planck Institute for Evolutionary Anthropology, has been working on a much improved database originally based on this data, but has not had any public data release so far. In the meantime, this is the best we can offer, as an alternative to the original website (http://ielex.mpi.nl) which is now offline. The data/ directory holds the files most researchers will be interested in: ielex.csv is a single table in long format containing the essential database information, given one entry per row, each with the associated language, concept, and cognate set. It also carries the glottocode (Hammarström et al., 2021) corresponding to each language; concept glosses are given via the corresponding Concepticon (List et al., 2021) cognate set. It is the file most people will want when searching for "IELex data". concepts.csv is a list mapping all concepts used in IELex to their corresponding concept set in Concepticon (List et al., 2021), giving both the gloss and the id. ielex.nex is a NEXUS file with the information from the table above, with ascertainment correction columns, charstate labels, and assumptions. The file is built with build/build_nexus.py. ielex.nn.pdf and ielex.nn.png are a PDF and a high-resolution PNG from the NEXUS file above, generated with SplitsTree (Huson & Bryant, 2006). ielex.mcc.tre is the Maximum Clade Credibility ("consensus") tree from our latest analysis (currently IE-tree-v1). The first 50% of the sample was removed as burn-in. ielex.mcc.pdf is a graphic rendering of the tree above, generated with FigTree. The build/ directory holds the files used for preparing the phylogenetic reconstruction. The IE-trees-v1/ directory holds the files related to the phylogenetic reconstruction, including model, logs, and state. As stated above, a summary of the main output, as ielex.mcc.tre, is found in the data/ directory. For statistical purposes researchers are likely to want the entire (unsummarised) tree sample, IE-trees-v1/ie-v1.nex.

本仓库包含基于IELex（Dunn等人2011年）的数据与脚本，用于生成足够满足系统发育比较方法使用需求的印欧语系贝叶斯系统发育树样本，可视为Dunn等人2011年提出的"印欧语系词汇同源数据库（Indo-European Lexical Cognacy Database）"的当前版本，该数据库俗称"IELex"。当前由马克斯·普朗克进化人类学研究所（Max Planck Institute for Evolutionary Anthropology）维护的"印欧语系同源关系（IE-CoR，Indo-European Cognate Relationships）"数据库项目，基于该原始数据开发了更为完善的数据库，但截至目前尚未发布任何公开数据集。鉴于原官方网站（http://ielex.mpi.nl）现已下线，本仓库作为其替代方案，是目前可获取的最佳资源。 data/目录包含多数研究者关注的核心文件： 1. ielex.csv：长格式单表文件，存储数据库核心信息，每行对应一条记录，包含关联语言、概念及同源组信息，同时附带对应每种语言的语系代码（glottocode，Hammarström等人2021年）；概念释义通过对应的Concepticon（List等人2021年）同源组给出，这也是大多数用户检索"IELex数据"时所需的文件。 2. concepts.csv：将IELex中使用的所有概念映射至其在Concepticon（List等人2021年）中的对应概念集，同时提供概念释义与编号。 3. ielex.nex：基于上述表格信息生成的NEXUS格式文件，包含ascertainment校正列、字符状态标签与相关分析假设，该文件通过build/build_nexus.py构建生成。 4. ielex.nn.pdf与ielex.nn.png：基于上述NEXUS文件生成的PDF与高分辨率PNG图像，通过SplitsTree（Huson & Bryant, 2006）生成。 5. ielex.mcc.tre：本次最新分析（当前版本为IE-tree-v1）得到的最大支系可信共识树（Maximum Clade Credibility consensus tree），已移除前50%的抽样作为预烧期（burn-in）样本。 6. ielex.mcc.pdf：上述共识树的可视化渲染结果，通过FigTree生成。 build/目录存储用于构建系统发育重建的相关文件。 IE-trees-v1/目录包含与系统发育重建相关的全部文件，包括分析模型、日志文件与状态文件。如前文所述，核心输出的摘要文件（即ielex.mcc.tre）已存放于data/目录。出于统计分析需求，研究者可能需要完整的未汇总树样本文件，即IE-trees-v1/ie-v1.nex。

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集