llm-book/jawiki-20220404-c400
收藏Hugging Face2023-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/llm-book/jawiki-20220404-c400
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
language:
- ja
size_categories:
- 10M<n<100M
---
# Dataset Card for jawiki-20220404-c400
This dataset contains passages, each of which consists of consecutive sentences no longer than 400 characters from Japanese Wikipedia as of 2022-04-04.
This dataset is used in baseline systems for [the AI王 question answering competition](https://sites.google.com/view/project-aio/home), such as [cl-tohoku/AIO3_BPR_baseline](https://github.com/cl-tohoku/AIO3_BPR_baseline).
Please refer to [the original repository](https://github.com/cl-tohoku/quiz-datasets) for further details.
提供机构:
llm-book
原始信息汇总
数据集概述
基本信息
- 数据集名称: jawiki-20220404-c400
- 许可证: MIT
- 语言: 日语 (ja)
- 大小: 10M<n<100M
数据内容
- 数据来源: 日本Wikipedia
- 时间: 截至2022-04-04
- 内容描述: 包含的每个段落由不超过400个字符的连续句子组成
应用场景
- 任务类别: 问答 (question-answering)
- 使用案例: 用于AI王问答竞赛的基线系统,如cl-tohoku/AIO3_BPR_baseline



