five

radna0/harmony-nemotron-cpu-artifacts

收藏
Hugging Face2026-01-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/radna0/harmony-nemotron-cpu-artifacts
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: [en] tags: [harmony, nemotron, parquet, cpu-normalized, candidate-pools] --- # Harmony CPU artifacts: Nemotron datasets (normalized + candidate pools) This dataset repo is an **artifact store** produced on an EPYC CPU box. It contains: - `normalized/` — CPU-normalized Parquet shards with a text-first Harmony format (`text`) plus `meta_*` and `quality_*` fields. - `pools/` — candidate pool Parquet shards (subsets) for later GPU scoring (Modal NLL/PPL). **No GPU scoring has been run yet.** - `reports/` — summary tables of counts per dataset/split/pool. ## Directory layout - `normalized/<dataset_tag>/data/<split>/part-*.parquet` - `normalized/<dataset_tag>/*__manifest.json` - `normalized/<dataset_tag>/*__tools_catalog.json` (when present) - `pools/<dataset_tag>/<pool_name>/<pool_name>__*.parquet` - `pools/<dataset_tag>/pools_manifest.json` Where `<dataset_tag>` is the HF dataset name with `/` replaced by `__`. ## Loading examples ```python from datasets import load_dataset # Load normalized shards for a split paths = ["normalized/nvidia__Nemotron-Math-v2/data/high_part00/*.parquet"] ds = load_dataset("parquet", data_files=paths, split="train") # Load a candidate pool paths = ["pools/nvidia__Nemotron-Math-v2/high_correctness/*.parquet"] pool = load_dataset("parquet", data_files=paths, split="train") ``` See `reports/cpu_full_run_summary.md` for totals.
提供机构:
radna0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作