sanganaka/NeCTIS-Dataset
收藏Hugging Face2025-07-29 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/sanganaka/NeCTIS-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
DepNeCTI-LSTM数据集是一个针对梵语中嵌套化合物类型识别的专用数据集。它包括两个数据集版本:NeCTIS(领域内,散文)和NeCTIS-OOD(领域外,诗歌)。数据集经过精心注释,包括粗粒度和细粒度语义类型注释。粗粒度标注包括四种广泛的化合物类型,而细粒度标注包含86种详细的子类型。数据集的构建得到了DeitY的支持,并经过了多个语言学专家团队的跨机构验证。
The DepNeCTI-LSTM dataset is a specialized dataset for nested compound type identification in Sanskrit. It includes two versions of the dataset: NeCTIS (in-domain, prose) and NeCTIS-OOD (out-of-domain, poetry). The dataset is meticulously annotated with both coarse-grained and fine-grained semantic type annotations. The coarse-grained annotation includes four broad compound types, while the fine-grained annotation comprises 86 detailed sub-types. The construction of the dataset was supported by DeitY and has undergone cross-institutional validation by teams of linguistic experts.
提供机构:
sanganaka



