five

hqq_plus_plus_mix_40k

收藏
魔搭社区2025-12-03 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/answerdotai/hqq_plus_plus_mix_40k
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Dataset Name Replication of HQQ++ dataset mixture mentioned in [here](https://mobiusml.github.io/1bit_blog/#datasets) with Llama 3 Instruct chat template. ## Uses Quantization Aware Training. ### Source Data - [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) - [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) - [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) - [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) #### Data Collection and Processing ```python train_ds1 = load_dataset("timdettmers/openassistant-guanaco")['train'] train_ds2 = load_dataset("microsoft/orca-math-word-problems-200k") train_ds2 = train_ds2['train'].shuffle(42).select(range(10000)) train_ds3 = load_dataset("meta-math/MetaMathQA") train_ds3 = train_ds3['train'].shuffle(42).select(range(10000)) train_ds4 = load_dataset('HuggingFaceH4/ultrafeedback_binarized') train_ds4 = train_ds4['train_sft'].shuffle(42).select(range(10000)) ``` ## Dataset Card Contact Kerem Turgutlu: k@answer.ai

# 数据集卡片:[数据集名称] 复现[此处](https://mobiusml.github.io/1bit_blog/#datasets)提及的HQQ++混合数据集,并适配Llama 3 Instruct对话模板。 ## 应用场景 量化感知训练(Quantization Aware Training)。 ### 源数据集 - [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) - [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) - [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) - [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) #### 数据采集与处理流程 python train_ds1 = load_dataset("timdettmers/openassistant-guanaco")['train'] train_ds2 = load_dataset("microsoft/orca-math-word-problems-200k") train_ds2 = train_ds2['train'].shuffle(42).select(range(10000)) train_ds3 = load_dataset("meta-math/MetaMathQA") train_ds3 = train_ds3['train'].shuffle(42).select(range(10000)) train_ds4 = load_dataset('HuggingFaceH4/ultrafeedback_binarized') train_ds4 = train_ds4['train_sft'].shuffle(42).select(range(10000)) ## 数据集卡片联系方式 凯雷姆·图尔古特卢(Kerem Turgutlu):k@answer.ai
提供机构:
maas
创建时间:
2024-12-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作