JSQuAD
收藏魔搭社区2025-10-09 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/sbintuitions/JSQuAD
下载链接
链接失效反馈官方服务:
资源简介:
評価スコアの再現性確保と SB Intuitions 修正版の公開用クローン
ソース: [yahoojapan/JGLUE on GitHub](https://github.com/yahoojapan/JGLUE/tree/main)
# JSQuAD
> JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension.
> Each instance in the dataset consists of a question regarding a given context (Wikipedia article) and its answer.
> JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions).
> We used the Japanese Wikipedia dump as of 20211101.
## Licensing Information
[Creative Commons Attribution Share Alike 4.0 International](https://github.com/yahoojapan/JGLUE/blob/main/LICENSE)
- [datasets/jsquad-v1.1 on GitHub](https://github.com/yahoojapan/JGLUE/tree/v1.1.0/datasets/jsquad-v1.1)
## Citation Information
```
@article{栗原 健太郎2023,
title={JGLUE: 日本語言語理解ベンチマーク},
author={栗原 健太郎 and 河原 大輔 and 柴田 知秀},
journal={自然言語処理},
volume={30},
number={1},
pages={63-87},
year={2023},
url = "https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_article/-char/ja",
doi={10.5715/jnlp.30.63}
}
@inproceedings{kurihara-etal-2022-jglue,
title = "{JGLUE}: {J}apanese General Language Understanding Evaluation",
author = "Kurihara, Kentaro and
Kawahara, Daisuke and
Shibata, Tomohide",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.317",
pages = "2957--2966",
abstract = "To develop high-performance natural language understanding (NLU) models, it is necessary to have a benchmark to evaluate and analyze NLU ability from various perspectives. While the English NLU benchmark, GLUE, has been the forerunner, benchmarks are now being released for languages other than English, such as CLUE for Chinese and FLUE for French; but there is no such benchmark for Japanese. We build a Japanese NLU benchmark, JGLUE, from scratch without translation to measure the general NLU ability in Japanese. We hope that JGLUE will facilitate NLU research in Japanese.",
}
@InProceedings{Kurihara_nlp2022,
author = "栗原健太郎 and 河原大輔 and 柴田知秀",
title = "JGLUE: 日本語言語理解ベンチマーク",
booktitle = "言語処理学会第28回年次大会",
year = "2022",
url = "https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/E8-4.pdf"
note= "in Japanese"
}
```
# Subsets
## default
- `id` (`str`): id of a question
- `title` (`str`): title of a Wikipedia article, (未 NFKC正規化)
- `context` (`str`): a concatenation of the title and paragraph, (未 NFKC正規化)
- `question`(`str`): question, (未 NFKC正規化)
- `answers`(`dict{answer_start: list(int), text: list(str)}`): a list of answers
- answer start positions (character index)
- answer texts, (未 NFKC正規化)
- `is_impossible`(`bool`): all the values are false
## 保障评测分数可复现性与SB Intuitions修正版公开克隆仓库
来源:[yahoojapan/JGLUE on GitHub](https://github.com/yahoojapan/JGLUE/tree/main)
# JSQuAD
> JSQuAD是阅读理解任务经典数据集SQuAD(Rajpurkar等,2016)的日语版本。
> 数据集中的每条样本均由针对给定上下文(维基百科文章)的问题及其对应答案组成。
> JSQuAD基于SQuAD 1.1版本构建,不存在无法回答的问题。
> 本数据集使用了2021年11月1日快照的日语维基百科转储文件。
## 许可信息
[Creative Commons Attribution Share Alike 4.0 International(CC BY-SA 4.0国际许可协议)](https://github.com/yahoojapan/JGLUE/blob/main/LICENSE)
- [datasets/jsquad-v1.1 on GitHub](https://github.com/yahoojapan/JGLUE/tree/v1.1.0/datasets/jsquad-v1.1)
## 引用信息
@article{栗原 健太郎2023,
title={JGLUE: 日语语言理解基准},
author={栗原 健太郎 and 河原 大輔 and 柴田 知秀},
journal={自然语言处理},
volume={30},
number={1},
pages={63-87},
year={2023},
url = "https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_article/-char/ja",
doi={10.5715/jnlp.30.63}
}
@inproceedings{kurihara-etal-2022-jglue,
title = "{JGLUE}: {J}apanese General Language Understanding Evaluation",
author = "Kurihara, Kentaro and
Kawahara, Daisuke and
Shibata, Tomohide",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.317",
pages = "2957--2966",
abstract = "为开发高性能自然语言理解(Natural Language Understanding, NLU)模型,需要构建能够从多维度评估与分析自然语言理解能力的基准测试集。英文自然语言理解基准GLUE作为先驱性工作已被广泛使用,目前其他语言也相继推出了对应基准,例如中文的CLUE与法语的FLUE,但日语领域尚无此类基准。我们从零开始构建了日语自然语言理解基准JGLUE,未对原始文本进行机器翻译,旨在评估日语环境下的通用自然语言理解能力。我们期望JGLUE能够推动日语自然语言理解领域的研究发展。"
}
@InProceedings{Kurihara_nlp2022,
author = "栗原健太郎 and 河原大輔 and 柴田知秀",
title = "JGLUE: 日语语言理解基准",
booktitle = "日本自然语言处理学会第28届年度会议",
year = "2022",
url = "https://www.anlp.jp/proceedings/annual_meeting/2022/pdf_dir/E8-4.pdf",
note= "in Japanese"
}
# 子集
## 默认子集
- `id` (`str`): 问题的唯一标识符
- `title` (`str`): 维基百科文章的标题(未进行NFKC归一化处理)
- `context` (`str`): 标题与段落的拼接文本(未进行NFKC归一化处理)
- `question`(`str`): 问题文本(未进行NFKC归一化处理)
- `answers`(`dict{answer_start: list(int), text: list(str)}`): 答案列表,包含两个字段:
- answer_start:答案在上下文中的起始字符位置
- text:答案文本(未进行NFKC归一化处理)
- `is_impossible`(`bool`): 所有样本的该字段值均为`false`,即不存在无法回答的问题
提供机构:
maas
创建时间:
2025-02-13



