five

Fuse

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/581678
下载链接
链接失效反馈
官方服务:
资源简介:
The contributors have provided two related datasets, which together constitute the FUSE spreadsheet corpus2.   + A Web Analysis dataset of 2,127,284 URLs that return spreadsheet content, along with the full HTTP web server response, formatted as JSON records. This dataset was obtained by filtering through 26.83 billion HTTP responses within the Common Crawl archive.   + A Binary Analysis dataset of 249,376 spreadsheets, extracted from the 1.9 PB of raw data within the Common Crawl archive. For each spreadsheet, the authors provide JSON metadata containing their analysis, which includes NLP token extraction and spreadsheet metrics.
创建时间:
2020-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作