Intel/neural-chat-dataset-v1-1
收藏Hugging Face2023-09-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Intel/neural-chat-dataset-v1-1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
Here is a collective list of instruction dataset used for Neural Chat fine-tuning. The total number of instruction samples and tokens are about 1.1M and 326M respectively.
| Type | Language | Dataset | Number |
|--| ---- |--------|----|
| HC3 | en | [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) | 24K |
| dolly | en | [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 15K |
| alpaca-zh | zh | [tigerbot-alpaca-zh-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-zh-0.5m) | 500K |
| alpaca-en | en | [TigerResearch/tigerbot-alpaca-en-50k](https://huggingface.co/datasets/TigerResearch/tigerbot-alpaca-en-50k) | 50K |
| math | en | [tigerbot-gsm-8k-en](https://huggingface.co/datasets/TigerResearch/tigerbot-gsm-8k-en) | 8K |
| general | en | [tigerbot-stackexchange-qa-en-0.5m](https://huggingface.co/datasets/TigerResearch/tigerbot-stackexchange-qa-en-0.5m) | 500K |
The collective dataset has been validated on multiple LLMs (such as MPT, LLama) by the NeuralChat team (Kaokao Lv, Wenxin Zhang, Xuhui Ren, and Haihao Shen) from Intel/SATG/AIA/AIPT. Thanks to [Hello-SimpleAI](https://huggingface.co/Hello-SimpleAI), [databricks](https://huggingface.co/databricks), [TigerResearch/TigerBot](https://github.com/TigerResearch/TigerBot) for releasing the open-source instruction dataset.
提供机构:
Intel
原始信息汇总
数据集概述
数据集组成
| 类型 | 语言 | 数据集名称 | 样本数量 |
|---|---|---|---|
| HC3 | 英语 | HC3 | 24,000 |
| dolly | 英语 | databricks-dolly-15k | 15,000 |
| alpaca-zh | 中文 | tigerbot-alpaca-zh-0.5m | 500,000 |
| alpaca-en | 英语 | TigerResearch/tigerbot-alpaca-en-50k | 50,000 |
| math | 英语 | tigerbot-gsm-8k-en | 8,000 |
| general | 英语 | tigerbot-stackexchange-qa-en-0.5m | 500,000 |
数据集规模
- 总样本数量:约1.1M
- 总令牌数量:约326M
许可证
- 许可证类型:Apache-2.0



