LCA-PORVID/delexicalized_n_grams
收藏Hugging Face2024-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LCA-PORVID/delexicalized_n_grams
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: journalistic
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 201756790
num_examples: 615012
download_size: 123762232
dataset_size: 201756790
- config_name: legal
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 1215937
num_examples: 7440
download_size: 687763
dataset_size: 1215937
- config_name: literature
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 8812642
num_examples: 40878
download_size: 5541660
dataset_size: 8812642
- config_name: politics
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 4082098
num_examples: 5778
download_size: 2340087
dataset_size: 4082098
- config_name: social_media
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 564540
num_examples: 8246
download_size: 340777
dataset_size: 564540
- config_name: web
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 85208776
num_examples: 79318
download_size: 49328021
dataset_size: 85208776
configs:
- config_name: journalistic
data_files:
- split: train
path: journalistic/train-*
- config_name: legal
data_files:
- split: train
path: legal/train-*
- config_name: literature
data_files:
- split: train
path: literature/train-*
- config_name: politics
data_files:
- split: train
path: politics/train-*
- config_name: social_media
data_files:
- split: train
path: social_media/train-*
- config_name: web
data_files:
- split: train
path: web/train-*
---
提供机构:
LCA-PORVID
原始信息汇总
数据集概述
数据集配置
新闻类(journalistic)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:201756790
- 样本数:615012
- 下载大小:123762232
- 数据集大小:201756790
法律类(legal)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:1215937
- 样本数:7440
- 下载大小:687763
- 数据集大小:1215937
文学类(literature)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:8812642
- 样本数:40878
- 下载大小:5541660
- 数据集大小:8812642
政治类(politics)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:4082098
- 样本数:5778
- 下载大小:2340087
- 数据集大小:4082098
社交媒体类(social_media)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:564540
- 样本数:8246
- 下载大小:340777
- 数据集大小:564540
网页类(web)
- 特征:
text:字符串类型label:64位整数类型
- 分割:
train:- 字节数:85208776
- 样本数:79318
- 下载大小:49328021
- 数据集大小:85208776



