siacus/huff2
收藏Hugging Face2023-09-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/siacus/huff2
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
- split: eval
path: data/eval-*
dataset_info:
features:
- name: link
dtype: string
- name: headline
dtype: string
- name: category
dtype: string
- name: short_description
dtype: string
- name: authors
dtype: string
- name: date
dtype: string
- name: id
dtype: string
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': U.S. NEWS
'1': COMEDY
'2': PARENTING
'3': WORLD NEWS
'4': CULTURE & ARTS
'5': TECH
'6': SPORTS
'7': ENTERTAINMENT
'8': POLITICS
'9': WEIRD NEWS
'10': ENVIRONMENT
'11': EDUCATION
'12': CRIME
'13': SCIENCE
'14': WELLNESS
'15': BUSINESS
'16': STYLE & BEAUTY
'17': FOOD & DRINK
'18': MEDIA
'19': QUEER VOICES
'20': HOME & LIVING
'21': WOMEN
'22': BLACK VOICES
'23': TRAVEL
'24': MONEY
'25': RELIGION
'26': LATINO VOICES
'27': IMPACT
'28': WEDDINGS
'29': COLLEGE
'30': PARENTS
'31': ARTS & CULTURE
'32': STYLE
'33': GREEN
'34': TASTE
'35': HEALTHY LIVING
'36': THE WORLDPOST
'37': GOOD NEWS
'38': WORLDPOST
'39': FIFTY
'40': ARTS
'41': DIVORCE
splits:
- name: train
num_bytes: 2184054
num_examples: 2100
- name: test
num_bytes: 2196326
num_examples: 2100
- name: eval
num_bytes: 2196326
num_examples: 2100
download_size: 1979356
dataset_size: 6576706
---
# Dataset Card for "huff2"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
---
## 配置项
- 配置名称:default(默认配置)
数据文件列表:
- 数据拆分:训练集(train),文件路径:`data/train-*`
- 数据拆分:测试集(test),文件路径:`data/test-*`
- 数据拆分:验证集(eval),文件路径:`data/eval-*`
## 数据集信息
### 特征字段
各字段的定义与数据类型如下:
1. `link`:字符串类型,对应新闻来源链接
2. `headline`:字符串类型,对应新闻标题
3. `category`:字符串类型,对应新闻人工标注分类
4. `short_description`:字符串类型,对应新闻简短摘要
5. `authors`:字符串类型,对应新闻作者信息
6. `date`:字符串类型,对应新闻发布日期
7. `id`:字符串类型,对应数据唯一标识编号
8. `text`:字符串类型,对应新闻正文内容
9. `label`:类别标签(class_label),标签与分类名称的映射关系如下:
- 0: 美国新闻(U.S. NEWS)
- 1: 喜剧(COMEDY)
- 2: 育儿(PARENTING)
- 3: 国际新闻(WORLD NEWS)
- 4: 文化与艺术(CULTURE & ARTS)
- 5: 科技(TECH)
- 6: 体育(SPORTS)
- 7: 娱乐(ENTERTAINMENT)
- 8: 政治(POLITICS)
- 9: 奇闻趣事(WEIRD NEWS)
- 10: 环境(ENVIRONMENT)
- 11: 教育(EDUCATION)
- 12: 犯罪(CRIME)
- 13: 科学(SCIENCE)
- 14: 健康养生(WELLNESS)
- 15: 商业(BUSINESS)
- 16: 风格与美妆(STYLE & BEAUTY)
- 17: 饮食与美食(FOOD & DRINK)
- 18: 媒体(MEDIA)
- 19: 酷儿之声(QUEER VOICES)
- 20: 家居与生活(HOME & LIVING)
- 21: 女性话题(WOMEN)
- 22: 黑人之声(BLACK VOICES)
- 23: 旅游(TRAVEL)
- 24: 财经(MONEY)
- 25: 宗教(RELIGION)
- 26: 拉丁裔之声(LATINO VOICES)
- 27: 影响力报道(IMPACT)
- 28: 婚礼(WEDDINGS)
- 29: 大学教育(COLLEGE)
- 30: 父母话题(PARENTS)
- 31: 艺术与文化(ARTS & CULTURE)
- 32: 时尚(STYLE)
- 33: 环保(GREEN)
- 34: 品味生活(TASTE)
- 35: 健康生活方式(HEALTHY LIVING)
- 36: 《世界邮报》(THE WORLDPOST)
- 37: 正能量新闻(GOOD NEWS)
- 38: 世界邮报(WORLDPOST)
- 39: 中老年话题(FIFTY,针对50岁以上群体)
- 40: 艺术(ARTS)
- 41: 离婚话题(DIVORCE)
### 数据拆分统计
各子集的字节大小与样本量如下:
- 训练集(train):字节数2184054,样本数量2100
- 测试集(test):字节数2196326,样本数量2100
- 验证集(eval):字节数2196326,样本数量2100
### 整体数据集规模
下载总大小:1979356 字节
数据集总存储大小:6576706 字节
---
# "huff2"数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
siacus
原始信息汇总
数据集概述
配置
- 默认配置:
- 数据文件:
- 训练集:
data/train-* - 测试集:
data/test-* - 评估集:
data/eval-*
- 训练集:
- 数据文件:
数据特征
- 特征列表:
link:字符串类型headline:字符串类型category:字符串类型short_description:字符串类型authors:字符串类型date:字符串类型id:字符串类型text:字符串类型label:分类标签- 类别名称:
- 0: U.S. NEWS
- 1: COMEDY
- 2: PARENTING
- 3: WORLD NEWS
- 4: CULTURE & ARTS
- 5: TECH
- 6: SPORTS
- 7: ENTERTAINMENT
- 8: POLITICS
- 9: WEIRD NEWS
- 10: ENVIRONMENT
- 11: EDUCATION
- 12: CRIME
- 13: SCIENCE
- 14: WELLNESS
- 15: BUSINESS
- 16: STYLE & BEAUTY
- 17: FOOD & DRINK
- 18: MEDIA
- 19: QUEER VOICES
- 20: HOME & LIVING
- 21: WOMEN
- 22: BLACK VOICES
- 23: TRAVEL
- 24: MONEY
- 25: RELIGION
- 26: LATINO VOICES
- 27: IMPACT
- 28: WEDDINGS
- 29: COLLEGE
- 30: PARENTS
- 31: ARTS & CULTURE
- 32: STYLE
- 33: GREEN
- 34: TASTE
- 35: HEALTHY LIVING
- 36: THE WORLDPOST
- 37: GOOD NEWS
- 38: WORLDPOST
- 39: FIFTY
- 40: ARTS
- 41: DIVORCE
- 类别名称:
数据集划分
- 训练集:
- 字节数:2184054
- 样本数:2100
- 测试集:
- 字节数:2196326
- 样本数:2100
- 评估集:
- 字节数:2196326
- 样本数:2100
数据集大小
- 下载大小:1979356 字节
- 数据集大小:6576706 字节



