answerdotai/hqq_plus_plus_mix_40k

Name: answerdotai/hqq_plus_plus_mix_40k
Creator: answerdotai
Published: 2024-07-09 14:09:52
License: 暂无描述

Hugging Face2024-07-09 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/answerdotai/hqq_plus_plus_mix_40k

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是HQQ++数据集的复制，使用了Llama 3 Instruct聊天模板，并用于量化感知训练。数据集来源于多个公开数据集，包括openassistant-guanaco、orca-math-word-problems-200k、MetaMathQA和ultrafeedback_binarized。数据集包含输入文本和输出文本两个特征，主要用于训练模型。

This dataset is a replication of the HQQ++ dataset mixture using the Llama 3 Instruct chat template. It features input and output text and is primarily used for Quantization Aware Training. The dataset includes data from multiple source datasets such as timdettmers/openassistant-guanaco, microsoft/orca-math-word-problems-200k, meta-math/MetaMathQA, and HuggingFaceH4/ultrafeedback_binarized. The training split contains 39846 examples with a total size of 57755012 bytes.

提供机构：

answerdotai

原始信息汇总

数据集概述

特征

input_text: 数据类型为字符串。
output_text: 数据类型为字符串。

数据分割

train: 包含39846个样本，总字节数为57755012。

下载与数据大小

下载大小: 28463096字节。
数据集大小: 57755012字节。

配置

default: 数据文件路径为data/train-*。

许可

MIT许可。

数据集规模

样本数量在10,000到100,000之间。

5,000+

优质数据集

54 个

任务类型

进入经典数据集