Evol-Instruct-Python-1k
收藏魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/Evol-Instruct-Python-1k
下载链接
链接失效反馈官方服务:
资源简介:
# Evol-Instruct-Python-1k
Subset of the [`mlabonne/Evol-Instruct-Python-26k`](https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-26k) dataset with only 1000 samples.
It was made by filtering out a few rows (instruction + output) with more than 2048 tokens, and then by keeping the 1000 longest samples.
Here is the distribution of the number of tokens in each row using Llama's tokenizer:

# Evol-Instruct-Python-1k 数据集
该数据集为 [`mlabonne/Evol-Instruct-Python-26k`](https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-26k) 数据集的子集,仅包含1000条样本。
其构建流程如下:首先剔除Token(Token)数量超过2048的条目(包含指令与输出内容),随后从剩余样本中选取长度最长的1000条。
以下为使用Llama分词器统计的单条条目Token数量分布:

提供机构:
maas
创建时间:
2025-03-18



