five

Cariban Lexical Database (CaLeD)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10019096
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a comprehensive collection of lexical items from various languages within the Carib linguistic family. It is structured to facilitate computational historical linguistics analysis, offering detailed information on language characteristics, word forms, and cognacy judgments. The data is curated to support research in linguistic typology, historical linguistics, and related fields. Data Structure The dataset is presented in a TSV (Tab-Separated Values) format, ensuring easy integration with common data analysis tools. Each lexical item in the dataset is detailed with multiple linguistic attributes, including phonological transcriptions, morphological analysis, and cognacy information. The following table summarizes the fields included in the dataset: Field Name Data Type Description ID string Unique identifier for each dataset entry. ID_lang string Unique identifier for the language within the dataset. Glottocode string Code uniquely identifying the language in the Glottolog database. Glottolog_Name string Name of the language as recorded in the Glottolog database. ISO639P3code string ISO 639-3 code for the language. ID_param string Unique identifier for the linguistic parameter or concept within the dataset. Concepticon_ID integer Identifier for the concept in the Concepticon database. Concepticon_Gloss string Gloss or definition of the concept from the Concepticon database. Value string Value of the linguistic data point, typically a word or phrase in the language. Form string Phonetic or phonological transcription of the linguistic data point. Segments string Further phonetic or phonological breakdown of the form. Source string Reference to the source or citation where the data was obtained. Morphemes string Morphological breakdown of the form. SimpleCognate integer Cognacy judgment, indicating whether the form is cognate with forms of the same meaning in related languages. PartialCognates string Partial cognacy coding, detailing the cognacy of individual segments or morphemes. Intended Use This dataset is intended for researchers and linguists specializing in the Carib linguistic family. It provides valuable insights into the lexical similarities and differences across the languages within this family, supporting studies on language evolution, relationships, and structure. Additional Resources Metadata for Validation: This dataset comes with comprehensive metadata following the Frictionless Data standard, ensuring that the data structure and types are accurately described for validation purposes. This metadata aids in maintaining the integrity and usability of the data across various computational platforms and research projects. CLDF Version Available: For researchers utilizing the Cross-Linguistic Data Formats (CLDF), a version of this dataset is available in CLDF specifications. This version is provided as a zipped file, facilitating easier distribution and handling.
创建时间:
2024-04-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作