ikedachin/JaQuAD_imabari_v2
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ikedachin/JaQuAD_imabari_v2
下载链接
链接失效反馈官方服务:
资源简介:
JaQuAD Imabari v2是一个基于SkelterLabsInc/JaQuAD数据集的context字段新生成的日语问答数据集,特点在于思考过程和最终回答均使用爱媛县今治市的方言撰写。数据集包含训练集1926个样本和验证集207个样本,适用于方言日语LLM的学习和评估、带有推理过程的SFT数据创建、标准日语输入对方言输出的研究等。数据集的创建方法包括使用原数据集的context字段生成新的问题、思考过程、回答等,并且思考过程和回答均使用方言。此外,数据集还包含了生成模型、评估信息和消息格式等元数据。
JaQuAD Imabari v2 is a Japanese QA dataset newly generated from the context field of SkelterLabsInc/JaQuAD, featuring reasoning processes and final answers written in the Imabari dialect of Ehime Prefecture, Japan. The dataset consists of 1926 training samples and 207 validation samples, intended for training and evaluating Japanese LLMs with dialectal data, building SFT datasets with explicit reasoning processes, and researching dialectal response generation from standard Japanese input. The dataset creation involves generating new questions, reasoning, and answers from the original context, with both reasoning and answers in dialect. It also includes metadata such as the generator model, evaluation information, and message formats.
提供机构:
ikedachin



