five

joshuasundance/codex-7m-qaqc-raw

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/joshuasundance/codex-7m-qaqc-raw
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是CodeX-7M QA/QC的原始工件,由myponline生成。包含带有完整行级QA/QC注释的parquet文件、通过结构QA/QC过滤器的行、通过ruff和mypy --strict的行、预分析的Python-fence幸存者等。此外,还提供了汇总计数和来源的summary.json文件,以及映射每个发布分片到源调度记录和输出目录的consolidation-manifest.json文件。数据集的来源是Modotte/CodeX-7M-Non-Thinking,并提供了输入修订版、发布运行ID和启动器信息。

This dataset repo is the consolidated shard-level QA/QC artifact generated by `myponline`. It includes root-level `annotated/shard_XXX.parquet` files with full row-level QA/QC annotations, `filtered_basic/` with rows that passed the structural QA/QC filter, `filtered_strict/` with rows that also passed standalone `ruff` and `mypy --strict`, `stage0_python_fence/` with the pre-analysis Python-fence survivors when present, `summary.json` with aggregate counts and provenance, and `consolidation-manifest.json` mapping each published shard to the source dispatch record and output directory. The input dataset is `Modotte/CodeX-7M-Non-Thinking` with input revision `7ebf047f899ff75f8a28fff77ca7f65938bb043a`, publish run id `codex-7m-qaqc-raw-publish-2026-04-29T162303Z`, and launcher `dispatched-hf-job`.
提供机构:
joshuasundance
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作