Harley-ml/HFMC
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Harley-ml/HFMC
下载链接
链接失效反馈官方服务:
资源简介:
HFMC代表Hugging Face模型配置。该数据集包含来自Hugging Face的超过7k个json模型配置。通过Hugging Face API抓取每个配置。数据集主要用于文本生成任务,语言为英语。数据集经过去重、语言过滤(仅英语)和长度过滤(使用领域内tokenizer限制为1024 tokens)。使用案例包括预训练或微调小型模型,以及将HFMC包含在更大的数据集中以增加覆盖范围。
HFMC stands for Hugging Face Model Configs. This dataset has over 7k json model configs from Hugging Face. We used the Hugging Face API to scrape each one. The dataset is primarily used for text-generation tasks, and the language is English. The dataset has been deduped, filtered via lang (only English), and length (1024 tokens using an in-domain tokenizer). Use cases include pretraining or fine-tuning small models, and including HFMC in a much larger dataset for coverage.
提供机构:
Harley-ml



