Sindhi Open Lexicon Dataset
收藏DataCite Commons2026-05-13 更新2026-05-18 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/ZUJA3P
下载链接
链接失效反馈官方服务:
资源简介:
Sindhi Open Lexicon Dataset (223K+ Entries) for AI, NLP & Computational Linguistics.
This project is a large-scale structured lexical dataset for the Sindhi language containing over 223,000 entries including definitions, linguistic metadata, and normalized forms.
Sindhi is a historically rich but low-resource language in AI. This dataset aims to support NLP, AI systems, and computational linguistics.
Objectives
- Provide AI-ready Sindhi dataset
- Support NLP research
- Enable search engines, chatbots, OCR, and language tools
- Preserve linguistic heritage digitally
Dataset Features
- 223,000+ entries
- Definitions in Sindhi
- Variants with/without diacritics
- Normalized text
- Domain classification
- Formats: CSV, JSONL, SQLite
提供机构:
Harvard Dataverse
创建时间:
2026-05-13



