allenai/scrapinghub-article-extraction-benchmark

Name: allenai/scrapinghub-article-extraction-benchmark
Creator: allenai
Published: 2023-08-30 22:05:48
License: 暂无描述

Hugging Face2023-08-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/allenai/scrapinghub-article-extraction-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: html dtype: string - name: articleBody dtype: string - name: url dtype: string splits: - name: train num_bytes: 32354376 num_examples: 181 download_size: 10374590 dataset_size: 32354376 configs: - config_name: default data_files: - split: train path: data/train-* license: mit task_categories: - text2text-generation pretty_name: Scrapinghub Article Extraction Benchmark size_categories: - n<1K --- # Scrapinghub Article Extraction Benchmark This dataset was originally created and distributed under MIT License by Scrapinghub on GitHub: [github.com/scrapinghub/article-extraction-benchmark](https://github.com/scrapinghub/article-extraction-benchmark) It is mirrored on the HuggingFace Hub as a convenience.

提供机构：

allenai

原始信息汇总

数据集概述

数据集信息

特征:
- html: 字符串类型
- articleBody: 字符串类型
- url: 字符串类型
分割:
- train: 包含32,354,376字节，181个样本
下载大小: 10,374,590字节
数据集大小: 32,354,376字节
配置:
- default: 数据文件路径为data/train-*
许可证: MIT
任务类别: 文本生成
名称: Scrapinghub Article Extraction Benchmark
大小类别: 小于1K

5,000+

优质数据集

54 个

任务类型

进入经典数据集