five

Sindhi WordNet Dataset

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/QHE4IT
下载链接
链接失效反馈
官方服务:
资源简介:
Sindhi WordNet Dataset Developed by: Abdul Majid Bhurgri Institute of Language Engineering (AMBILE), Hyderabad Under the administrative control of the Culture, Tourism, Antiquities & Archives Department, Government of Sindh Overview The Sindhi WordNet Tagging Dataset contains a collection of Sindhi words, annotated with various linguistic features such as categories, tenses, synonyms, antonyms, and more. This dataset is designed for natural language processing (NLP) tasks, particularly for tasks such as word sense disambiguation, semantic analysis, and syntactic tagging for the Sindhi language. Dataset Structure The dataset is provided in CSV format and contains the following columns: word_id: A unique identifier for each word entry. word: The Sindhi word. category: The part of speech or syntactic category of the word (e.g., noun, verb, adjective). gender: The gender associated with the word (if applicable). invariants: Information about whether the word is invariant (e.g., whether it has plural or singular forms). tags: The syntactic or semantic tag associated with the word (e.g., conjunction, preposition). tenses: The tense information for the word (if applicable). hyp: Any hypernyms associated with the word. antonyms: Antonyms for the word (if available). synonyms: Synonyms for the word (if available). Example word_id word category gender invariants tags tenses hyp antonyms synonyms 1 ۽ - - - con - - - - 2 ۾ - - - pp - - - - 3 اَبَدُ - - singular noun,adv - - - - 4 اَبَدِي - - singular adj - - - - 5 اَبَدِيت - - singular noun - - - - Features Comprehensive word annotations for various linguistic categories. Multiple syntactic and semantic features such as tense, gender, synonyms, and antonyms. Designed for Sindhi NLP tasks, helping improve language processing for the Sindhi language. Usage This dataset can be used for a wide range of NLP tasks such as: Part-of-speech tagging. Word sense disambiguation. Semantic analysis. You can load and process this dataset using any standard CSV reader in your preferred programming language (e.g., Python's pandas). Acknowledgments Special thanks to the AMBILE team for their efforts in data curation, cleaning, formatting, and tagging. Data Source The dataset is sourced from the AMBILE WordNet project. License This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 License. It is intended for educational and research purposes only. Contact For any queries, collaboration opportunities, or contributions, please contact: Email: datasets@sindh.ai
创建时间:
2025-10-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作