MWirelabs/northeast-india-tribes-and-subtribes
收藏Hugging Face2025-09-03 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/MWirelabs/northeast-india-tribes-and-subtribes
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是印度东北部8个州部落和亚部落的首个结构化开源集合,包含118多个条目,涵盖阿鲁纳恰尔邦、阿萨姆邦、曼尼普尔邦、梅加拉亚邦、米佐拉姆邦、那加兰邦、特里普拉邦和锡金。每个条目包含州、部落、亚部落/氏族、地区/地区、使用的语言、分布注释和语言家族等字段。数据集旨在支持印度东北部语言和文化的LLM训练,为研究人员、开发者和教育工作者提供文化和语言参考,并保护开放数据中的民族语言多样性。数据集由MWire Labs创建,采用CC-BY 4.0许可。
This dataset is the first structured open-source collection of tribes and sub-tribes across the 8 states of Northeast India, including Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Tripura, and Sikkim. It contains over 118 entries with fields such as State, Tribe, Sub-tribes / Clans, Regions / Districts, Language(s) Spoken, Notes on Distribution, and Linguistic Family. The dataset aims to support LLM training for Northeast Indian languages and cultures, provide a cultural and linguistic reference for researchers, developers, and educators, and preserve ethnolinguistic diversity in open data. The dataset is created by MWire Labs and licensed under CC-BY 4.0.
提供机构:
MWirelabs



