Jia555/cis519_news_urls
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Jia555/cis519_news_urls
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为CIS 5190课程的Project B(新闻标题分类器)构建的,主要用于通过URL衍生文本进行政治新闻来源(福克斯新闻与NBC新闻)的二分类任务。数据来源是福克斯新闻和NBC新闻的官方网站,总URL数为40,858条,其中福克斯新闻19,787条,NBC新闻21,071条,时间跨度为多个出版年份。数据集是通过大规模爬取新闻网站并经过URL级别的去重处理创建的。每个数据条目包含原始新闻文章的URL。数据集主要用于研究和教育目的。
This dataset is constructed for Project B (News Headline Classifier) in CIS 5190. The task is to classify political news sources (Fox News vs. NBC News) using URL-derived text under URL-only input constraints. Sources are official Fox News and NBC News websites. Total URLs are 40,858 (Fox News: 19,787, NBC News: 21,071) spanning multiple publication years. The dataset is created by large-scale crawling of official news websites, followed by exact URL-level deduplication. Each data entry contains the original news article URL. The dataset is intended for research and educational purposes only.
提供机构:
Jia555



