davanstrien/hub-card-prompts-v2
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/hub-card-prompts-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- summarization
- text-generation
tags:
- distillation
- on-policy
- hub-cards
size_categories:
- 1K<n<10K
---
# hub-card-prompts-v2
Prompts-only dataset for on-policy distillation of Hugging Face model/dataset card summarizers.
Used to train v2 of [davanstrien/Smol-Hub-tldr](https://huggingface.co/davanstrien/Smol-Hub-tldr).
## Format
Each row is a single chat `messages` list with one `user` turn (no assistant target — pure on-policy).
```
<MODEL_CARD>{body_without_yaml}
Summarize this Hugging Face model in one sentence. Wrap your summary in <CARD_SUMMARY>...</CARD_SUMMARY> tags.
```
or `<DATASET_CARD>…` for dataset cards.
## Provenance
Source: daily snapshots from
- `librarian-bots/model_cards_with_metadata`
- `librarian-bots/dataset_cards_with_metadata`
Snapshot date: 2026-04-17
## Filters applied
- `last_modified` within last 90 days (fresh repos only)
- Body length (post-YAML-strip): 300–15000 chars for real cards, 100–15000 for stubs
- Dedup by `(author, body[:200])` — drops auto-generated near-duplicates
- **Mix**: 4500 real cards (prompts-only, on-policy) + 500 template-stub cards (**with assistant refusal target** for off-policy teaching — use with `--lmbda 0.8`)
- Balanced: 2500 model + 2500 dataset
## Columns
- `messages`: list of chat-format dicts (one user turn, no assistant)
- `id`: repo id (modelId or datasetId)
- `kind`: "model" or "dataset"
- `last_modified`: ISO timestamp
- `downloads`, `likes`: Hub metadata at snapshot time
## Citation
Built 2026-04 as part of the smol-hub-tldr v2 training run. See
[davanstrien/Smol-Hub-tldr v2 report](https://huggingface.co/davanstrien/Smol-Hub-tldr).
提供机构:
davanstrien



