five

tmnam20/Vietnamese-News

收藏
Hugging Face2024-01-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tmnam20/Vietnamese-News
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: all features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 23505962013 num_examples: 2421826 download_size: 10986340753 dataset_size: 23505962013 - config_name: baochinhphu features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 733982734 num_examples: 58400 download_size: 312699305 dataset_size: 733982734 - config_name: dantri features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 1265117393 num_examples: 100396 download_size: 551235606 dataset_size: 1265117393 - config_name: laodong features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 2939780592 num_examples: 392668 download_size: 0 dataset_size: 2939780592 - config_name: qdnd features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 2731532774 num_examples: 259691 download_size: 0 dataset_size: 2731532774 - config_name: vietnamnet features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 14103390400 num_examples: 1444898 download_size: 6773926864 dataset_size: 14103390400 - config_name: vnexpress features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 1235989143 num_examples: 133438 download_size: 537754843 dataset_size: 1235989143 - config_name: vtc features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 230258605 num_examples: 10440 download_size: 66975140 dataset_size: 230258605 - config_name: zingnews features: - name: title dtype: string - name: description dtype: string - name: content dtype: string - name: text dtype: string - name: url dtype: string splits: - name: train num_bytes: 265910372 num_examples: 21895 download_size: 124252870 dataset_size: 265910372 configs: - config_name: all data_files: - split: train path: all/train-* - config_name: baochinhphu data_files: - split: train path: baochinhphu/train-* - config_name: dantri data_files: - split: train path: dantri/train-* - config_name: laodong data_files: - split: train path: laodong/train-* - config_name: qdnd data_files: - split: train path: qdnd/train-* - config_name: vietnamnet data_files: - split: train path: vietnamnet/train-* - config_name: vnexpress data_files: - split: train path: vnexpress/train-* - config_name: vtc data_files: - split: train path: vtc/train-* - config_name: zingnews data_files: - split: train path: zingnews/train-* --- # Dataset Card for "VietnameseNewsparquet" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
tmnam20
原始信息汇总

数据集概述

数据集配置

全部数据 (all)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 23,505,962,013
      • 样本数: 2,421,826
  • 下载大小: 10,986,340,753 字节
  • 数据集大小: 23,505,962,013 字节

宝政府 (baochinhphu)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 733,982,734
      • 样本数: 58,400
  • 下载大小: 312,699,305 字节
  • 数据集大小: 733,982,734 字节

丹瑞 (dantri)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 1,265,117,393
      • 样本数: 100,396
  • 下载大小: 551,235,606 字节
  • 数据集大小: 1,265,117,393 字节

劳动 (laodong)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 2,939,780,592
      • 样本数: 392,668
  • 下载大小: 0 字节
  • 数据集大小: 2,939,780,592 字节

全民国防 (qdnd)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 2,731,532,774
      • 样本数: 259,691
  • 下载大小: 0 字节
  • 数据集大小: 2,731,532,774 字节

越南网 (vietnamnet)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 14,103,390,400
      • 样本数: 1,444,898
  • 下载大小: 6,773,926,864 字节
  • 数据集大小: 14,103,390,400 字节

VNExpress (vnexpress)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 1,235,989,143
      • 样本数: 133,438
  • 下载大小: 537,754,843 字节
  • 数据集大小: 1,235,989,143 字节

VTC (vtc)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 230,258,605
      • 样本数: 10,440
  • 下载大小: 66,975,140 字节
  • 数据集大小: 230,258,605 字节

Zing新闻 (zingnews)

  • 特征:
    • title: 字符串
    • description: 字符串
    • content: 字符串
    • text: 字符串
    • url: 字符串
  • 分割:
    • train:
      • 字节数: 265,910,372
      • 样本数: 21,895
  • 下载大小: 124,252,870 字节
  • 数据集大小: 265,910,372 字节

数据文件路径

  • all: all/train-*
  • baochinhphu: baochinhphu/train-*
  • dantri: dantri/train-*
  • laodong: laodong/train-*
  • qdnd: qdnd/train-*
  • vietnamnet: vietnamnet/train-*
  • vnexpress: vnexpress/train-*
  • vtc: vtc/train-*
  • zingnews: zingnews/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作