NIKAW/ud-latin-r2.15
收藏Hugging Face2025-03-06 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/NIKAW/ud-latin-r2.15
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是基于多个拉丁语语料库(包括UDante、PROIEL、Perseus、LLCT、ITTB和CIRCSE)的聚合统一依赖关系树库,使用conllu进行解析。数据集包含各种token级别的特征,其ID字段为字符串类型,以支持token范围。部分特征因半结构化内容而以JSON字符串形式存储。数据集与Hugging Face数据集和Arrow格式兼容。不同语料库的授权协议不同,用户需要根据corpus字段过滤以确保遵守相应的授权协议。
This dataset is an aggregated Universal Dependencies treebank for Latin, based on various Latin corpora including UDante, PROIEL, Perseus, LLCT, ITTB, and CIRCSE, parsed with conllu. It contains various token-level features, with the ID field as a string to support token ranges. Some features are stored as JSON strings due to their semi-structured content. The dataset is compatible with Hugging Face datasets and the Arrow format. Different corpora have different licenses, and users are advised to filter by the corpus field to comply with the respective licenses.
提供机构:
NIKAW



