joshuasundance/codex-7m-qaqc-raw
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/joshuasundance/codex-7m-qaqc-raw
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是CodeX-7M QA/QC的原始工件,由myponline生成。包含带有完整行级QA/QC注释的parquet文件、通过结构QA/QC过滤器的行、通过ruff和mypy --strict的行、预分析的Python-fence幸存者等。此外,还提供了汇总计数和来源的summary.json文件,以及映射每个发布分片到源调度记录和输出目录的consolidation-manifest.json文件。数据集的来源是Modotte/CodeX-7M-Non-Thinking,并提供了输入修订版、发布运行ID和启动器信息。
This dataset repo is the consolidated shard-level QA/QC artifact generated by `myponline`. It includes root-level `annotated/shard_XXX.parquet` files with full row-level QA/QC annotations, `filtered_basic/` with rows that passed the structural QA/QC filter, `filtered_strict/` with rows that also passed standalone `ruff` and `mypy --strict`, `stage0_python_fence/` with the pre-analysis Python-fence survivors when present, `summary.json` with aggregate counts and provenance, and `consolidation-manifest.json` mapping each published shard to the source dispatch record and output directory. The input dataset is `Modotte/CodeX-7M-Non-Thinking` with input revision `7ebf047f899ff75f8a28fff77ca7f65938bb043a`, publish run id `codex-7m-qaqc-raw-publish-2026-04-29T162303Z`, and launcher `dispatched-hf-job`.
提供机构:
joshuasundance



