shi3z/ja_conv_wikipedia_llama2pro8b_10k
收藏Hugging Face2024-01-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/shi3z/ja_conv_wikipedia_llama2pro8b_10k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: llama2
task_categories:
- conversational
language:
- ja
size_categories:
- 10K<n<100K
---
This dataset is based on the Japanese version of Wikipedia dataset and converted into a multi-turn conversation format using llama2Pro8B. After generating 10,000 conversations and screening, only about 3,000 were usable, so I will publish them in this state first.
Since it is a llama2 license, it can be used commercially for services.
Some strange dialogue may be included as it has not been screened by humans.
We generated 30,000 conversations over 24 hours on an A100 80GBx7 machine and automatically screened them.
# Model
https://huggingface.co/spaces/TencentARC/LLaMA-Pro-8B-Instruct-Chat
# Dataset
https://huggingface.co/datasets/izumi-lab/wikipedia-ja-20230720
# Compute by
Tsuginosuke AI SuperComputer
FreeAI Ltd.
https://free-ai.ltd
提供机构:
shi3z
原始信息汇总
数据集概述
数据集信息
- 许可协议: llama2
- 任务类别: 对话型
- 语言: 日语
- 数据规模: 10K<n<100K
数据集描述
- 该数据集基于日文版Wikipedia数据集,通过llama2Pro8B转换为多轮对话格式。
- 生成了10,000个对话并进行筛选,最终约有3,000个可用对话。
- 数据集包含一些未经人工筛选的奇怪对话。
- 在A100 80GBx7机器上,24小时内生成了30,000个对话并自动筛选。
使用许可
- 该数据集可用于商业服务。
相关链接
- 模型链接: https://huggingface.co/spaces/TencentARC/LLaMA-Pro-8B-Instruct-Chat
- 数据集链接: https://huggingface.co/datasets/izumi-lab/wikipedia-ja-20230720
- 计算资源提供方: Tsuginosuke AI SuperComputer, FreeAI Ltd.



