davanstrien/hub-card-prompts-v2

Name: davanstrien/hub-card-prompts-v2
Creator: davanstrien
Published: 2026-04-17 21:58:27
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/davanstrien/hub-card-prompts-v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - summarization - text-generation tags: - distillation - on-policy - hub-cards size_categories: - 1K<n<10K --- # hub-card-prompts-v2 Prompts-only dataset for on-policy distillation of Hugging Face model/dataset card summarizers. Used to train v2 of [davanstrien/Smol-Hub-tldr](https://huggingface.co/davanstrien/Smol-Hub-tldr). ## Format Each row is a single chat `messages` list with one `user` turn (no assistant target — pure on-policy). ``` <MODEL_CARD>{body_without_yaml} Summarize this Hugging Face model in one sentence. Wrap your summary in <CARD_SUMMARY>...</CARD_SUMMARY> tags. ``` or `<DATASET_CARD>…` for dataset cards. ## Provenance Source: daily snapshots from - `librarian-bots/model_cards_with_metadata` - `librarian-bots/dataset_cards_with_metadata` Snapshot date: 2026-04-17 ## Filters applied - `last_modified` within last 90 days (fresh repos only) - Body length (post-YAML-strip): 300–15000 chars for real cards, 100–15000 for stubs - Dedup by `(author, body[:200])` — drops auto-generated near-duplicates - **Mix**: 4500 real cards (prompts-only, on-policy) + 500 template-stub cards (**with assistant refusal target** for off-policy teaching — use with `--lmbda 0.8`) - Balanced: 2500 model + 2500 dataset ## Columns - `messages`: list of chat-format dicts (one user turn, no assistant) - `id`: repo id (modelId or datasetId) - `kind`: "model" or "dataset" - `last_modified`: ISO timestamp - `downloads`, `likes`: Hub metadata at snapshot time ## Citation Built 2026-04 as part of the smol-hub-tldr v2 training run. See [davanstrien/Smol-Hub-tldr v2 report](https://huggingface.co/davanstrien/Smol-Hub-tldr).

提供机构：

davanstrien

5,000+

优质数据集

54 个

任务类型

进入经典数据集