LCA-PORVID/portuguese_vid
收藏Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LCA-PORVID/portuguese_vid
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: journalistic
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 1181293029
num_examples: 1662328
- name: test
num_bytes: 6152239
num_examples: 10000
download_size: 768345947
dataset_size: 1187445268
- config_name: legal
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 921824075
num_examples: 2968772
- name: test
num_bytes: 307020
num_examples: 1000
download_size: 538088874
dataset_size: 922131095
- config_name: literature
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 27243374
num_examples: 69026
- name: test
num_bytes: 2012791
num_examples: 5000
download_size: 20355505
dataset_size: 29256165
- config_name: politics
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 36542306
num_examples: 27886
- name: test
num_bytes: 1307161
num_examples: 1000
download_size: 22363181
dataset_size: 37849467
- config_name: social_media
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 229000888
num_examples: 1775121
- name: test
num_bytes: 120836
num_examples: 1000
download_size: 155606292
dataset_size: 229121724
- config_name: web
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 163835015
num_examples: 81069
- name: test
num_bytes: 20078136
num_examples: 10000
download_size: 102505500
dataset_size: 183913151
configs:
- config_name: journalistic
data_files:
- split: train
path: journalistic/train-*
- split: test
path: journalistic/test-*
- config_name: legal
data_files:
- split: train
path: legal/train-*
- split: test
path: legal/test-*
- config_name: literature
data_files:
- split: train
path: literature/train-*
- split: test
path: literature/test-*
- config_name: politics
data_files:
- split: train
path: politics/train-*
- split: test
path: politics/test-*
- config_name: social_media
data_files:
- split: train
path: social_media/train-*
- split: test
path: social_media/test-*
- config_name: web
data_files:
- split: train
path: web/train-*
- split: test
path: web/test-*
---
提供机构:
LCA-PORVID
原始信息汇总
数据集概述
数据集配置
新闻(journalistic)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 1181293029
- 样本数: 1662328
- test:
- 字节数: 6152239
- 样本数: 10000
- train:
- 下载大小: 768345947
- 数据集大小: 1187445268
法律(legal)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 921824075
- 样本数: 2968772
- test:
- 字节数: 307020
- 样本数: 1000
- train:
- 下载大小: 538088874
- 数据集大小: 922131095
文学(literature)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 27243374
- 样本数: 69026
- test:
- 字节数: 2012791
- 样本数: 5000
- train:
- 下载大小: 20355505
- 数据集大小: 29256165
政治(politics)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 36542306
- 样本数: 27886
- test:
- 字节数: 1307161
- 样本数: 1000
- train:
- 下载大小: 22363181
- 数据集大小: 37849467
社交媒体(social_media)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 229000888
- 样本数: 1775121
- test:
- 字节数: 120836
- 样本数: 1000
- train:
- 下载大小: 155606292
- 数据集大小: 229121724
网络(web)
- 特征:
- text: string
- label: int64
- 分割:
- train:
- 字节数: 163835015
- 样本数: 81069
- test:
- 字节数: 20078136
- 样本数: 10000
- train:
- 下载大小: 102505500
- 数据集大小: 183913151



