Evol-Instruct-Python-26k
收藏魔搭社区2025-10-09 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/Evol-Instruct-Python-26k
下载链接
链接失效反馈官方服务:
资源简介:
# Evol-Instruct-Python-26k
Filtered version of the [`nickrosh/Evol-Instruct-Code-80k-v1`](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) dataset that only keeps Python code (26,588 samples). You can find a smaller version of it here [`mlabonne/Evol-Instruct-Python-1k`](https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-1k).
Here is the distribution of the number of tokens in each row (instruction + output) using Llama's tokenizer:

# Evol-Instruct-Python-26k
本数据集为 [`nickrosh/Evol-Instruct-Code-80k-v1`](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) 数据集的筛选子集,仅保留Python代码相关样本,共计26588条。您可通过 [`mlabonne/Evol-Instruct-Python-1k`](https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-1k) 获取该数据集的精简版本。
以下为使用Llama分词器统计的每条样本(指令与输出)的Token(Token)数量分布:

提供机构:
maas
创建时间:
2025-03-18



