EER6/nvidia-OpenCodeInstruct-broad

Name: EER6/nvidia-OpenCodeInstruct-broad
Creator: EER6
Published: 2026-04-01 20:45:31
License: 暂无描述

Hugging Face2026-04-01 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/EER6/nvidia-OpenCodeInstruct-broad

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 source_datasets: - nvidia/OpenCodeInstruct task_categories: - text-generation language: - en tags: - code - sft - instruction-tuning - filtered size_categories: - 1M<n<10M --- # nvidia-OpenCodeInstruct-broad A quality-filtered subset of [nvidia/OpenCodeInstruct](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) (5M examples). ## Filtering criteria Both conditions must be satisfied: | Criterion | Threshold | |-----------|-----------| | **LLM judge min score** | >= 4 (out of 5) | | **Unit test pass rate** (`average_test_score`) | >= 0.8 | **LLM judge min score** is the minimum across all three dimensions in the `llm_judgement` field: - `requirement_conformance` — does the code do what the instruction asked? - `logical_correctness` — is the algorithm/logic correct? - `edge_case_consideration` — does it handle edge cases? A min score >= 4 means *every* dimension scores at least 4/5. **Unit test pass rate** (`average_test_score`) is the fraction of 10 LLM-generated unit tests that the solution passes. >= 0.8 means at least 8 out of 10 tests pass. ## Result - **Source size:** 5,000,000 - **Filtered size:** 1,698,239 - **Retention rate:** 34.0% [EER6/nvidia-OpenCodeInstruct-refined](https://huggingface.co/datasets/EER6/nvidia-OpenCodeInstruct-refined) is a strict subset of this dataset with tighter thresholds (llm_min = 5, test = 1.0, 444K examples). ## Schema All original columns from `nvidia/OpenCodeInstruct` are preserved as-is — no transforms or column additions. See the [original dataset card](https://huggingface.co/datasets/nvidia/OpenCodeInstruct) for column descriptions and citation.

提供机构：

EER6

5,000+

优质数据集

54 个

任务类型

进入经典数据集