snork-maiden/gemma-2b-it-lmsys-subset-tokenized-v2

Name: snork-maiden/gemma-2b-it-lmsys-subset-tokenized-v2
Creator: snork-maiden
Published: 2025-10-29 07:32:11
License: 暂无描述

Hugging Face2025-10-29 更新2025-11-15 收录

下载链接：

https://hf-mirror.com/datasets/snork-maiden/gemma-2b-it-lmsys-subset-tokenized-v2

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个包含名为tokens的int64类型序列特征的NLP数据集。数据集划分为训练集，共有497549个样本，总大小为约4GB。提供了默认配置，用于指定训练集数据文件的路径。

This is an NLP dataset with a feature named tokens of type int64 sequence. The dataset is split into a training set with a total of 497549 samples and a total size of approximately 4GB. A default configuration is provided to specify the path to the training set data files.

提供机构：

snork-maiden

5,000+

优质数据集

54 个

任务类型

进入经典数据集