Sindhi WordNet Dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/QHE4IT
下载链接
链接失效反馈官方服务:
资源简介:
Sindhi WordNet Dataset Developed by: Abdul Majid Bhurgri Institute of Language Engineering (AMBILE), Hyderabad Under the administrative control of the Culture, Tourism, Antiquities & Archives Department, Government of Sindh Overview The Sindhi WordNet Tagging Dataset contains a collection of Sindhi words, annotated with various linguistic features such as categories, tenses, synonyms, antonyms, and more. This dataset is designed for natural language processing (NLP) tasks, particularly for tasks such as word sense disambiguation, semantic analysis, and syntactic tagging for the Sindhi language. Dataset Structure The dataset is provided in CSV format and contains the following columns: word_id: A unique identifier for each word entry. word: The Sindhi word. category: The part of speech or syntactic category of the word (e.g., noun, verb, adjective). gender: The gender associated with the word (if applicable). invariants: Information about whether the word is invariant (e.g., whether it has plural or singular forms). tags: The syntactic or semantic tag associated with the word (e.g., conjunction, preposition). tenses: The tense information for the word (if applicable). hyp: Any hypernyms associated with the word. antonyms: Antonyms for the word (if available). synonyms: Synonyms for the word (if available). Example word_id word category gender invariants tags tenses hyp antonyms synonyms 1 ۽ - - - con - - - - 2 ۾ - - - pp - - - - 3 اَبَدُ - - singular noun,adv - - - - 4 اَبَدِي - - singular adj - - - - 5 اَبَدِيت - - singular noun - - - - Features Comprehensive word annotations for various linguistic categories. Multiple syntactic and semantic features such as tense, gender, synonyms, and antonyms. Designed for Sindhi NLP tasks, helping improve language processing for the Sindhi language. Usage This dataset can be used for a wide range of NLP tasks such as: Part-of-speech tagging. Word sense disambiguation. Semantic analysis. You can load and process this dataset using any standard CSV reader in your preferred programming language (e.g., Python's pandas). Acknowledgments Special thanks to the AMBILE team for their efforts in data curation, cleaning, formatting, and tagging. Data Source The dataset is sourced from the AMBILE WordNet project. License This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 License. It is intended for educational and research purposes only. Contact For any queries, collaboration opportunities, or contributions, please contact: Email: datasets@sindh.ai
创建时间:
2025-10-30



