Calvin-Xu/Furigana-NDLBIB
收藏Hugging Face2024-07-28 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Calvin-Xu/Furigana-NDLBIB
下载链接
链接失效反馈官方服务:
资源简介:
该数据集来源于全国书志数据,是一个包含振假名(日语中的注音假名)的文本数据集。在验证过程中,消除了原始语料库中的5064个不匹配实例。数据集主要用于文本到文本生成任务,语言为日语,标签包括furigana。数据集的大小在10M到100M之间。
This dataset is derived from the national bibliography data, where certain mismatches in the original corpus were eliminated during validation (5064 instances). It is primarily used for text generation tasks, with Japanese as the language, the tag furigana, and the name Furigana Annotation Corpus (National Diet Library Corpus), with a size between 10M and 100M.
提供机构:
Calvin-Xu



