distily/synth_tdecay_gpt2_seq_1K

Name: distily/synth_tdecay_gpt2_seq_1K
Creator: distily
Published: 2024-08-26 09:22:26
License: 暂无描述

Hugging Face2024-08-26 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/distily/synth_tdecay_gpt2_seq_1K

下载链接

链接失效反馈

官方服务：

资源简介：

--- source_datasets: - Original - Synthetic library_name: Distily tags: - Distily configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 1354975 num_examples: 1000 download_size: 379763 dataset_size: 1354975 --- # Distillation dataset created with [Distily](https://github.com/lapp0/distily). - **Method**: Generated sequences randomly with temperature config `ExponentialDecayArguments(start_t=100.0, end_t=0.5, N=1024, scale_factor=20)` - **Model URI**: `gpt2` - **Number of Samples**: 1000 - **Maximum Sequence Length**: 1024 tokens

源数据集： - 原始数据集 - 合成数据集库名称：Distily 标签： - Distily 配置项： - 配置名称：default 数据文件： - 拆分：训练集路径：data/train-* 数据集信息：特征： - 名称：text 数据类型：字符串拆分信息： - 拆分名称：train 字节数：1354975 样本数：1000 下载大小：379763 数据集总大小：1354975 # 基于[Distily](https://github.com/lapp0/distily)构建的蒸馏数据集。 - **方法**：采用温度配置为`ExponentialDecayArguments(start_t=100.0, end_t=0.5, N=1024, scale_factor=20)`的随机序列生成方式 - **模型URI**：`gpt2` - **样本数量**：1000 - **最大序列长度**：1024个Token（Token）

提供机构：

distily

5,000+

优质数据集

54 个

任务类型

进入经典数据集