tmnam20/Vietnamese-News
收藏Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tmnam20/Vietnamese-News
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: all
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 23505962013
num_examples: 2421826
download_size: 10986340753
dataset_size: 23505962013
- config_name: baochinhphu
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 733982734
num_examples: 58400
download_size: 312699305
dataset_size: 733982734
- config_name: dantri
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 1265117393
num_examples: 100396
download_size: 551235606
dataset_size: 1265117393
- config_name: laodong
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 2939780592
num_examples: 392668
download_size: 0
dataset_size: 2939780592
- config_name: qdnd
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 2731532774
num_examples: 259691
download_size: 0
dataset_size: 2731532774
- config_name: vietnamnet
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 14103390400
num_examples: 1444898
download_size: 6773926864
dataset_size: 14103390400
- config_name: vnexpress
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 1235989143
num_examples: 133438
download_size: 537754843
dataset_size: 1235989143
- config_name: vtc
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 230258605
num_examples: 10440
download_size: 66975140
dataset_size: 230258605
- config_name: zingnews
features:
- name: title
dtype: string
- name: description
dtype: string
- name: content
dtype: string
- name: text
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 265910372
num_examples: 21895
download_size: 124252870
dataset_size: 265910372
configs:
- config_name: all
data_files:
- split: train
path: all/train-*
- config_name: baochinhphu
data_files:
- split: train
path: baochinhphu/train-*
- config_name: dantri
data_files:
- split: train
path: dantri/train-*
- config_name: laodong
data_files:
- split: train
path: laodong/train-*
- config_name: qdnd
data_files:
- split: train
path: qdnd/train-*
- config_name: vietnamnet
data_files:
- split: train
path: vietnamnet/train-*
- config_name: vnexpress
data_files:
- split: train
path: vnexpress/train-*
- config_name: vtc
data_files:
- split: train
path: vtc/train-*
- config_name: zingnews
data_files:
- split: train
path: zingnews/train-*
---
# Dataset Card for "VietnameseNewsparquet"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
tmnam20
原始信息汇总
数据集概述
数据集配置
全部数据 (all)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 23,505,962,013
- 样本数: 2,421,826
- 下载大小: 10,986,340,753 字节
- 数据集大小: 23,505,962,013 字节
宝政府 (baochinhphu)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 733,982,734
- 样本数: 58,400
- 下载大小: 312,699,305 字节
- 数据集大小: 733,982,734 字节
丹瑞 (dantri)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 1,265,117,393
- 样本数: 100,396
- 下载大小: 551,235,606 字节
- 数据集大小: 1,265,117,393 字节
劳动 (laodong)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 2,939,780,592
- 样本数: 392,668
- 下载大小: 0 字节
- 数据集大小: 2,939,780,592 字节
全民国防 (qdnd)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 2,731,532,774
- 样本数: 259,691
- 下载大小: 0 字节
- 数据集大小: 2,731,532,774 字节
越南网 (vietnamnet)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 14,103,390,400
- 样本数: 1,444,898
- 下载大小: 6,773,926,864 字节
- 数据集大小: 14,103,390,400 字节
VNExpress (vnexpress)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 1,235,989,143
- 样本数: 133,438
- 下载大小: 537,754,843 字节
- 数据集大小: 1,235,989,143 字节
VTC (vtc)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 230,258,605
- 样本数: 10,440
- 下载大小: 66,975,140 字节
- 数据集大小: 230,258,605 字节
Zing新闻 (zingnews)
- 特征:
title: 字符串description: 字符串content: 字符串text: 字符串url: 字符串
- 分割:
train:- 字节数: 265,910,372
- 样本数: 21,895
- 下载大小: 124,252,870 字节
- 数据集大小: 265,910,372 字节
数据文件路径
all:all/train-*baochinhphu:baochinhphu/train-*dantri:dantri/train-*laodong:laodong/train-*qdnd:qdnd/train-*vietnamnet:vietnamnet/train-*vnexpress:vnexpress/train-*vtc:vtc/train-*zingnews:zingnews/train-*



