hqq_plus_plus_mix_40k
收藏魔搭社区2025-12-03 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/answerdotai/hqq_plus_plus_mix_40k
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Dataset Name
Replication of HQQ++ dataset mixture mentioned in [here](https://mobiusml.github.io/1bit_blog/#datasets) with Llama 3 Instruct chat template.
## Uses
Quantization Aware Training.
### Source Data
- [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)
- [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k)
- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
- [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
#### Data Collection and Processing
```python
train_ds1 = load_dataset("timdettmers/openassistant-guanaco")['train']
train_ds2 = load_dataset("microsoft/orca-math-word-problems-200k")
train_ds2 = train_ds2['train'].shuffle(42).select(range(10000))
train_ds3 = load_dataset("meta-math/MetaMathQA")
train_ds3 = train_ds3['train'].shuffle(42).select(range(10000))
train_ds4 = load_dataset('HuggingFaceH4/ultrafeedback_binarized')
train_ds4 = train_ds4['train_sft'].shuffle(42).select(range(10000))
```
## Dataset Card Contact
Kerem Turgutlu: k@answer.ai
# 数据集卡片:[数据集名称]
复现[此处](https://mobiusml.github.io/1bit_blog/#datasets)提及的HQQ++混合数据集,并适配Llama 3 Instruct对话模板。
## 应用场景
量化感知训练(Quantization Aware Training)。
### 源数据集
- [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)
- [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k)
- [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
- [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
#### 数据采集与处理流程
python
train_ds1 = load_dataset("timdettmers/openassistant-guanaco")['train']
train_ds2 = load_dataset("microsoft/orca-math-word-problems-200k")
train_ds2 = train_ds2['train'].shuffle(42).select(range(10000))
train_ds3 = load_dataset("meta-math/MetaMathQA")
train_ds3 = train_ds3['train'].shuffle(42).select(range(10000))
train_ds4 = load_dataset('HuggingFaceH4/ultrafeedback_binarized')
train_ds4 = train_ds4['train_sft'].shuffle(42).select(range(10000))
## 数据集卡片联系方式
凯雷姆·图尔古特卢(Kerem Turgutlu):k@answer.ai
提供机构:
maas
创建时间:
2024-12-20



