avidanborisov/leetcode2000-rl

Name: avidanborisov/leetcode2000-rl
Creator: avidanborisov
Published: 2026-04-17 11:13:57
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/avidanborisov/leetcode2000-rl

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: LeetCode2000-RL license: apache-2.0 size_categories: - 1K<n<10K task_categories: - text-generation language: - en tags: - code - python - reinforcement-learning - algorithms source_datasets: - newfacade/LeetCodeDataset configs: - config_name: default data_files: - split: train path: train.parquet - split: validation path: validation.parquet - split: test path: test.parquet --- # LeetCode2000-RL LeetCode2000-RL contains 2000 LeetCode programming tasks with one Python solution per task, normalized correctness test cases, and speed-test inputs with measured runtimes. It is a curated derivative of [`newfacade/LeetCodeDataset`](https://huggingface.co/datasets/newfacade/LeetCodeDataset). New LLM-generated solutions and runtime-focused test cases were used to benchmark candidate solutions, replace slower solutions when faster correct variants were found, and select the final solution for each task. The final dataset was then re-validated end-to-end. The dataset is intended for RL training with a Python execution environment, as well as evaluation and code-optimization experiments. ## Splits | Split | Rows | |---|---:| | train | 1600 | | validation | 200 | | test | 200 | Splits are stratified by difficulty and runtime distribution. ## Fields - `task_id` — stable task slug - `problem` — cleaned problem statement - `constraints` — extracted constraints block - `difficulty` — Easy / Medium / Hard - `tags` — topic tags - `entrypoint` — method name to implement - `starter_code` — Python starter scaffold - `test_cases` — correctness cases as `{"kwargs": ..., "expected": ...}` - `speed_tests` — runtime-focused inputs as `{"input_expr": "{'kwargs': ...}", "runtime_ms": ...}` - `solution` — Python solution - `solution_runtime_ms` — total execution time of the solution across correctness + speed tests Runtimes were measured on a specific machine and reflect execution time inside the solution itself. For the Parquet viewer/loading path, `test_cases` are stored as `kwargs_json` / `expected_json` strings for schema stability. ## Attribution Curated derivative of [`newfacade/LeetCodeDataset`](https://huggingface.co/datasets/newfacade/LeetCodeDataset). ## Citation If you use this dataset, please cite both the upstream source and this derivative. **Upstream** ```bibtex @article{xia2025leetcodedataset, title={LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs}, author={Yunhui Xia and Wei Shen and Yan Wang and Jason Klein Liu and Huifeng Sun and Siyue Wu and Jian Hu and Xiaolong Xu}, journal={arXiv preprint arXiv:2504.14655}, year={2025}, url={https://arxiv.org/abs/2504.14655} } ``` **This dataset** ```bibtex @misc{borisov2026leetcode2000rl, title={LeetCode2000-RL}, author={Avidan Borisov}, year={2026}, howpublished={Hugging Face dataset}, url={https://huggingface.co/datasets/avidanborisov/leetcode2000-rl}, note={Curated set of 2000 LeetCode problems with high-quality Python solutions, correctness tests, and speed tests} } ```

提供机构：

avidanborisov

5,000+

优质数据集

54 个

任务类型

进入经典数据集