Kendamarron/jimba-wiki-instruction-calm3

Name: Kendamarron/jimba-wiki-instruction-calm3
Creator: Kendamarron
Published: 2024-07-20 12:57:05
License: 暂无描述

Hugging Face2024-07-20 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/Kendamarron/jimba-wiki-instruction-calm3

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个使用特定模型生成的合成instruction数据集，主要用于日语文本生成任务。数据集包含id、instruction、output、samples和text等特征，数据量在10K到100K之间。创建过程中，使用了现有的instruction和文本数据生成新的instruction和output。由于生成过程中未进行过滤，部分数据可能包含不准确的信息。

This is a synthetic instruction dataset generated using the Q4_K_M model from grapevine-AI/CALM3-22B-Chat-GGUF. The dataset includes fields such as id, instruction, output, samples, and text, primarily for text generation tasks. The dataset is divided into a training set with 11,997 samples. The dataset uses Japanese, is tagged as synthetic data, and falls within the size category of 10K to 100K. The dataset was generated by processing instruction samples from Kendamarron/jimba-instuction-1k-beta and text from izumi-lab/wikipedia-ja-20230720.

提供机构：

Kendamarron

原始信息汇总

Kendamarron/jimba-wiki-instruction-calm3

数据集概述

特征:
- id: 整数类型
- instruction: 字符串类型
- output: 字符串类型
- samples: 字符串序列
- text: 字符串类型
分割:
- train: 包含11997个样本，总大小为138479557字节
下载大小: 75624989字节
数据集大小: 138479557字节
配置:
- default: 包含训练数据文件
许可证: CC BY-SA 3.0
任务类别: 文本生成
语言: 日语
标签: 合成数据
大小类别: 10K<n<100K

数据集特点

数据集使用grapevine-AI/CALM3-22B-Chat-GGUF的Q4_K_M模型生成。
数据集包含部分合成指令，可能包含少量幻觉内容。
未进行生成后的过滤，可能包含直接生成输出的情况。

数据集创建步骤

使用Kendamarron/jimba-instuction-1k-beta的3个指令样本，结合izumi-lab/wikipedia-ja-20230720的文本生成指令。
生成指令后，结合原始文本生成输出。

使用方法

python from datasets import load_dataset

dataset = load_dataset("Kendamarron/jimba-wiki-instruction-calm3", split="train")

5,000+

优质数据集

54 个

任务类型

进入经典数据集