Evan823/headlines_clean_2022_2025_slugs
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Evan823/headlines_clean_2022_2025_slugs
下载链接
链接失效反馈官方服务:
资源简介:
一个轻量级的**Fox News与NBC News**二分类文本数据集,基于**新闻URL**构建。每个样本使用从URL最后一个有意义的路径组件(slug)派生并规范化为简短、干净文本的**伪标题**,而非完整文章文本。这使得数据集便于快速基线测试(TF-IDF + 线性模型)和稳定的Transformer微调。
- **任务**:二分类文本分类(预测发布者/来源)
- **标签**:`0 = NBC`,`1 = Fox`
- **文本**:规范化的URL slug / 伪标题
- **语言**:英语
- **许可证**:MIT
A lightweight **Fox News vs. NBC News** binary text classification dataset built from **news URLs**. Instead of using full article text, each example uses a **pseudo-headline** derived from the URL’s last meaningful path component (slug) and normalized into short, clean text. This makes the dataset convenient for fast baselines (TF-IDF + linear models) and also stable for transformer fine-tuning.
- **Task**: binary text classification (predict publisher/source)
- **Labels**: `0 = NBC`, `1 = Fox`
- **Text**: normalized URL slug / pseudo-headline
- **Language**: English
- **License**: MIT
提供机构:
Evan823



