fchaubard/funny_bench

Name: fchaubard/funny_bench
Creator: fchaubard
Published: 2026-04-18 00:37:41
License: 暂无描述

Hugging Face2026-04-18 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/fchaubard/funny_bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - comedy - stand-up - humor - dpo - sft - funny size_categories: - 10K<n<100K dataset_info: features: - name: text dtype: string - name: source dtype: string splits: - name: train num_bytes: 64671620 num_examples: 369940 - name: test num_bytes: 1321826 num_examples: 7550 download_size: 37816655 dataset_size: 65993446 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # FunnyBench: Stand-Up Comedy Dataset for LLM Training LLMs aren't funny. This initiative tries to solve that. A curated dataset of **24,438 stand-up comedy transcripts** with engagement metrics, designed for teaching LLMs to generate funny content. Shout out to Shofo (https://www.shofo.ai/) for providing the dataset for free for public use! ## Dataset Splits ### SFT (Supervised Fine-Tuning) - **23,216 train** / **1,222 test** examples - Chat format with quality-tier conditioning - Fields: `messages`, `tier`, `engagement_rate`, `like_count`, `play_count`, `duration_seconds`, `video_id`, `author` ```python from datasets import load_dataset ds = load_dataset("fchaubard/funny_bench", "sft") ``` ### DPO (Direct Preference Optimization) - **11,607 train** / **611 test** preference pairs - Pairs matched by duration bucket - "Chosen" = higher engagement rate, "Rejected" = lower engagement rate - Fields: `prompt`, `chosen`, `rejected`, `chosen_engagement`, `rejected_engagement` ```python from datasets import load_dataset ds = load_dataset("fchaubard/funny_bench", "dpo") ``` ## Source Data - **29,729 TikTok stand-up comedy clips** (pre-filtered: 10,000+ likes, English, standup hashtags) - Transcribed using NVIDIA Canary-Qwen 2.5B - Speaker diarization via NVIDIA NeMo MSDD - Labels: `[COMEDIAN]`, `[AUDIENCE]`, `[LAUGHTER]` ## Cleaning Pipeline | Filter | Threshold | Dropped | |--------|-----------|---------| | Transcript length | 80-12,000 chars | 1,163 | | Duration | 15-300 seconds | 2,098 | | Word repetition score | <= 0.55 | 1,954 | | Unique word count | >= 15 | 54 | | ASR garbage detection | trigram loops | 22 | | **Total removed** | | **5,291 (17.8%)** | | **Clean dataset** | | **24,438 (82.2%)** | ## Quality Tiers (SFT) Each SFT example has a quality tier based on engagement rate (likes/views): | Tier | Percentile | Engagement Rate | Count | |------|-----------|----------------|-------| | `[LEGENDARY]` | Top 5% | > 21.8% | 1,222 | | `[KILLER]` | 75-95th | 14.7-21.8% | 4,888 | | `[SOLID]` | 50-75th | 10.8-14.7% | 6,109 | | `[WARMING_UP]` | Bottom 50% | < 10.8% | 12,219 | At inference, prompt with `[LEGENDARY]` to generate top-tier comedy. ## Why Engagement Rate? Raw like counts are dominated by virality and follower counts. The engagement rate (likes/views) better captures per-viewer funniness. A clip with 1M views and 200K likes (20%) is funnier per-viewer than one with 100M views and 5M likes (5%). ## SFT Format ```json { "messages": [ {"role": "system", "content": "You are a stand-up comedian performing a live set..."}, {"role": "user", "content": "[LEGENDARY] Perform a stand-up comedy bit."}, {"role": "assistant", "content": "[COMEDIAN]: So where are you from?\n[AUDIENCE]: Texas!\n[COMEDIAN]: Texas? Oh man...\n[LAUGHTER]"} ], "tier": "LEGENDARY", "engagement_rate": 0.22, "like_count": 500000, "play_count": 2200000 } ``` ## DPO Format ```json { "prompt": [ {"role": "system", "content": "You are a stand-up comedian..."}, {"role": "user", "content": "Perform a stand-up comedy bit."} ], "chosen": [{"role": "assistant", "content": "...funnier transcript..."}], "rejected": [{"role": "assistant", "content": "...less funny transcript..."}], "chosen_engagement": 0.18, "rejected_engagement": 0.05 } ``` ## Limitations - ASR artifacts from NVIDIA Canary-Qwen 2.5B transcription - Comedy depends heavily on delivery and timing that text can't capture - TikTok bias toward short-form, punchy comedy - Engagement != funny (controversy and relatability also drive engagement) ## Citation If you use this dataset, please cite: ``` @misc{funnybench2026, title={FunnyBench: Teaching LLMs Stand-Up Comedy with Engagement-Based Preference Learning}, year={2026} } ```

提供机构：

fchaubard

5,000+

优质数据集

54 个

任务类型

进入经典数据集