davanstrien/hub-card-prompts

Name: davanstrien/hub-card-prompts
Creator: davanstrien
Published: 2026-04-15 16:45:29
License: 暂无描述

Hugging Face2026-04-15 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/davanstrien/hub-card-prompts

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: prompt dtype: string - name: id dtype: string - name: kind dtype: string splits: - name: train num_examples: 5000 --- # Hub Card Prompts Training prompts for distilling a Hugging Face card summarisation model (gemma-4-E2B-it student ← gemma-4-31B-it teacher). ## Source Filtered subset of [librarian-bots/model_cards_with_metadata](https://huggingface.co/datasets/librarian-bots/model_cards_with_metadata) and [librarian-bots/dataset_cards_with_metadata](https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata). ## Filter rules 1. **Minimum card length**: ≥ 300 characters (drops empty cards, template stubs, and near-empty entries) 2. **Maximum card length**: ≤ 15,000 characters (drops extremely long cards that would dominate token budget) 3. **Deduplication**: unique by (author, first 200 chars of card) — keeps the most-downloaded version when near-duplicates exist 4. **Auto-generated stub removal**: cards starting with `# Model Card for` are dropped 5. **Sampling**: 2,500 model cards + 2,500 dataset cards, randomly shuffled after sorting by downloads descending ## Prompt template ``` You are generating a TL;DR for a Hugging Face {kind} card. Write a single paragraph that covers: - What the {kind} is and what it does - Key technical details (architecture, size, training data if mentioned) - How to use it (load code snippet if available) {kind}: {id} {card} ``` ## Resulting dataset - 5,000 rows total (2,500 model + 2,500 dataset cards) - Each row has a `messages` column (single user-turn chat format) and a `prompt` column (plain text) - Designed for pure on-policy distillation (`lmbda=1.0`) — no assistant completions included

提供机构：

davanstrien

5,000+

优质数据集

54 个

任务类型

进入经典数据集