RealTimeData/News_Seq_2021
收藏Hugging Face2023-08-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RealTimeData/News_Seq_2021
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: authors
sequence: string
- name: date_download
dtype: string
- name: date_modify
dtype: string
- name: date_publish
dtype: string
- name: description
dtype: string
- name: filename
dtype: string
- name: image_url
dtype: string
- name: language
dtype: string
- name: localpath
dtype: string
- name: maintext
dtype: string
- name: source_domain
dtype: string
- name: title
dtype: string
- name: title_page
dtype: string
- name: title_rss
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 16944979
num_examples: 4252
download_size: 8112201
dataset_size: 16944979
---
# Dataset Card for "News_Seq_2021"
This dataset was constructed at 1 Seq 2021, which contains news published from 10 June 2021 to 21 Aug 2021 from various sources.
All news articles in this dataset are in English.
Created from `commoncrawl`.
提供机构:
RealTimeData
原始信息汇总
数据集概述
数据集名称
- 名称: News_Seq_2021
数据集创建信息
- 创建日期: 1 Seq 2021
- 包含内容: 新闻文章,发布日期从2021年6月10日至2021年8月21日
- 语言: 英语
- 来源: commoncrawl
数据集特征
- 特征列表:
- authors: 字符串序列
- date_download: 字符串
- date_modify: 字符串
- date_publish: 字符串
- description: 字符串
- filename: 字符串
- image_url: 字符串
- language: 字符串
- localpath: 字符串
- maintext: 字符串
- source_domain: 字符串
- title: 字符串
- title_page: 字符串
- title_rss: 字符串
- url: 字符串
数据集分割
- 分割:
- 训练集:
- 大小: 16944979 字节
- 示例数量: 4252
- 训练集:
数据集大小
- 下载大小: 8112201 字节
- 数据集大小: 16944979 字节



