five

DeL-TaiseiOzaki/Tengentoppa-sft-v2.0

收藏
Hugging Face2024-11-26 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-v2.0
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个通过整合14个日语instruction-following数据集创建的监督学习用数据集。数据集包含对话形式的数据、问答和推理任务等多种数据源。主要收集了需要高级推理能力的任务,如mt-bench和elyza-task。数据集的格式为JSON,每个数据点的结构包括instruction、input和output字段。数据集的转换代码在GitHub上公开。数据集包含多个子数据集,每个子数据集都有特定的名称和来源。数据集的格式已经统一,包括基本形式和对话形式。使用时需要注意各源数据集的许可证、数据质量和可能的掩码处理。

This is a supervised learning dataset created by integrating 14 Japanese instruction-following datasets. The dataset contains various data sources such as conversational data, question-answering, and reasoning tasks. It primarily collects tasks requiring advanced reasoning abilities, such as mt-bench and elyza-task. The dataset is in JSON format, with each data point structured to include instruction, input, and output fields. The conversion code for the dataset is publicly available on GitHub. The dataset includes multiple sub-datasets, each with specific names and sources. The dataset format has been unified, including basic and conversational forms. Attention should be paid to the licenses of each source dataset, data quality, and possible masking when using the dataset.
提供机构:
DeL-TaiseiOzaki
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作