five

davanstrien/hub-card-prompts-v2

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/hub-card-prompts-v2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - summarization - text-generation tags: - distillation - on-policy - hub-cards size_categories: - 1K<n<10K --- # hub-card-prompts-v2 Prompts-only dataset for on-policy distillation of Hugging Face model/dataset card summarizers. Used to train v2 of [davanstrien/Smol-Hub-tldr](https://huggingface.co/davanstrien/Smol-Hub-tldr). ## Format Each row is a single chat `messages` list with one `user` turn (no assistant target — pure on-policy). ``` <MODEL_CARD>{body_without_yaml} Summarize this Hugging Face model in one sentence. Wrap your summary in <CARD_SUMMARY>...</CARD_SUMMARY> tags. ``` or `<DATASET_CARD>…` for dataset cards. ## Provenance Source: daily snapshots from - `librarian-bots/model_cards_with_metadata` - `librarian-bots/dataset_cards_with_metadata` Snapshot date: 2026-04-17 ## Filters applied - `last_modified` within last 90 days (fresh repos only) - Body length (post-YAML-strip): 300–15000 chars for real cards, 100–15000 for stubs - Dedup by `(author, body[:200])` — drops auto-generated near-duplicates - **Mix**: 4500 real cards (prompts-only, on-policy) + 500 template-stub cards (**with assistant refusal target** for off-policy teaching — use with `--lmbda 0.8`) - Balanced: 2500 model + 2500 dataset ## Columns - `messages`: list of chat-format dicts (one user turn, no assistant) - `id`: repo id (modelId or datasetId) - `kind`: "model" or "dataset" - `last_modified`: ISO timestamp - `downloads`, `likes`: Hub metadata at snapshot time ## Citation Built 2026-04 as part of the smol-hub-tldr v2 training run. See [davanstrien/Smol-Hub-tldr v2 report](https://huggingface.co/davanstrien/Smol-Hub-tldr).
提供机构:
davanstrien
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作