mahiyama/auto-wiki-qa
收藏Hugging Face2026-04-27 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mahiyama/auto-wiki-qa
下载链接
链接失效反馈官方服务:
资源简介:
Auto Wiki QA Triplets (ruri-rels curated) 是一个基于日语Wikipedia自动生成的QA数据集,主要用于SPLADE等稀疏检索模型的学习。数据集包含triplets和pairs两种格式,triplets格式包含anchor、positive和多个negative样本,用于对比学习;pairs格式包含anchor和positive样本,用于pairwise学习。数据集的语言为日语,许可证为cc-by-sa-4.0,数据来源于Wikipedia。数据集的构建过程包括从原始数据集中提取passage_id和anc,进行NFKC正规化,然后通过join操作获取正例和负例的passage文本。数据集还包含train和eval两个split,分别用于训练和评估。
Auto Wiki QA Triplets (ruri-rels curated) is a dataset based on automatically generated QA pairs from Japanese Wikipedia, primarily used for training sparse retrieval models like SPLADE. The dataset includes two formats: triplets and pairs. The triplets format consists of anchor, positive, and multiple negative samples for contrastive learning, while the pairs format includes anchor and positive samples for pairwise learning. The dataset is in Japanese, licensed under cc-by-sa-4.0, and sourced from Wikipedia. The construction process involves extracting passage_id and anc from the original dataset, performing NFKC normalization, and then joining to obtain the passage texts for positive and negative examples. The dataset also includes train and eval splits for training and evaluation purposes.
提供机构:
mahiyama



