codelion/qwen-fineweb-distilled
收藏Hugging Face2025-04-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/codelion/qwen-fineweb-distilled
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多个特征字段,其中包括教师嵌入维度(teacher_embedding_dim)、是否为元数据(is_metadata)、输入ID序列(input_ids)、序列长度(sequence_lengths)、精确值序列(exact_values)、系数序列(coeffs)和打包索引(packed_indices)。数据集分为训练集(train),包含55个示例,总大小为8391068355字节。数据集的下载大小为1971033491字节。数据集提供了默认配置,其中指定了训练数据的文件路径。
The dataset includes multiple feature fields such as teacher embedding dimension (teacher_embedding_dim), whether it is metadata (is_metadata), input ID sequence (input_ids), sequence length (sequence_lengths), exact value sequence (exact_values), coefficient sequence (coeffs), and packed indices (packed_indices). The dataset is split into a training set (train) containing 55 examples, with a total size of 8391068355 bytes. The download size of the dataset is 1971033491 bytes. The dataset provides a default configuration, which specifies the file path for the training data.
提供机构:
codelion



