five

Intel/neural-chat-dataset-v1-1

收藏
Hugging Face2023-09-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Intel/neural-chat-dataset-v1-1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.1M and 326M respectively. | Type | Language | Dataset | Number | |--| ---- |--------|----| | HC3 | en | [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) | 24K | | dolly | en | [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 15K | | alpaca-zh | zh | [tigerbot-alpaca-zh-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-zh-0.5m) | 500K | | alpaca-en | en | [TigerResearch/tigerbot-alpaca-en-50k](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-en-50k) | 50K | | math | en | [tigerbot-gsm-8k-en](https://huggingface.co/datasets/TigerResearch/tigerbot-gsm-8k-en) | 8K | | general | en | [tigerbot-stackexchange-qa-en-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-stackexchange-qa-en-0.5m) | 500K | The collective dataset has been validated on multiple LLMs (such as MPT, LLama) by the NeuralChat team (Kaokao Lv, Wenxin Zhang, Xuhui Ren, and Haihao Shen) from Intel/SATG/AIA/AIPT. Thanks to [Hello-SimpleAI](https://huggingface.co/Hello-SimpleAI), [databricks](https://huggingface.co/databricks), [TigerResearch/TigerBot](https://github.com/TigerResearch/TigerBot) for releasing the open-source instruction dataset.
提供机构:
Intel
原始信息汇总

数据集概述

数据集组成

类型 语言 数据集名称 样本数量
HC3 英语 HC3 24,000
dolly 英语 databricks-dolly-15k 15,000
alpaca-zh 中文 tigerbot-alpaca-zh-0.5m 500,000
alpaca-en 英语 TigerResearch/tigerbot-alpaca-en-50k 50,000
math 英语 tigerbot-gsm-8k-en 8,000
general 英语 tigerbot-stackexchange-qa-en-0.5m 500,000

数据集规模

  • 总样本数量:约1.1M
  • 总令牌数量:约326M

许可证

  • 许可证类型:Apache-2.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作