oyxy2019/THUCNewsText
收藏Hugging Face2023-05-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/oyxy2019/THUCNewsText
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': education
'1': entertainment
'2': fashion
'3': finance
'4': game
'5': politic
'6': society
'7': sport
'8': stock
'9': technology
splits:
- name: train
num_bytes: 126435258
num_examples: 50000
- name: validation
num_bytes: 12851939
num_examples: 5000
- name: test
num_bytes: 25321290
num_examples: 9890
download_size: 110495565
dataset_size: 164608487
---
# Dataset Card for "THUCNewsText"
这是[seamew/THUCNewsText](https://huggingface.co/datasets/seamew/THUCNewsText)的克隆,试图解决谷歌硬盘国内无法访问的问题443
```python
from datasets import load_dataset
datasets = load_dataset("seamew/THUCNewsText")
datasets.push_to_hub("oyxy2019/THUCNewsText")
```
提供机构:
oyxy2019
原始信息汇总
数据集概述
数据集名称
- THUCNewsText
数据集特征
- text: 字符串类型
- label: 分类标签,包括以下类别:
- 0: education
- 1: entertainment
- 2: fashion
- 3: finance
- 4: game
- 5: politic
- 6: society
- 7: sport
- 8: stock
- 9: technology
数据集划分
- 训练集: 50000个样本,占用126435258字节
- 验证集: 5000个样本,占用12851939字节
- 测试集: 9890个样本,占用25321290字节
数据集大小
- 下载大小: 110495565字节
- 数据集总大小: 164608487字节



