Fredithefish/Nemotron-CC-HQ-20B
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Fredithefish/Nemotron-CC-HQ-20B
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
pretty_name: Coo
---
# Nemotron-CC-HQ-20B
This Dataset consists of approximately 20B tokens of Nemotron-CC-HQ, consisting of randomly sampled slices from crawls in the range `CC-MAIN-2013-20-part-00012` to `CC-MAIN-2019-04-part-00007`.
For more information about Nemotron-CC check the [Paper by Nvidia](https://arxiv.org/abs/2412.02595)
<footer style="margin-top: 40px; padding-top: 10px; border-top: 1px solid #ccc; font-size: 0.9em; color: #666;">
<p>
<strong>Disclaimer:</strong>
Derived from Nemotron-CC (Common Crawl). No ownership of underlying content is claimed.
Data may be subject to third-party rights. Use at your own risk and in compliance with applicable laws and Common Crawl Terms of Use.
</p>
</footer>
提供机构:
Fredithefish



