andreaparker/wiki-screenshot-corpus_subsampled-5k
收藏Hugging Face2024-12-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/andreaparker/wiki-screenshot-corpus_subsampled-5k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含图像、文档ID、文本和标题四个主要特征。数据集分为一个训练集,包含4949个样本,总大小为1599524363.63字节。下载大小为1594573423字节。训练集的数据文件路径为data/train-*。
This dataset contains four main features: image, docid (document ID), text, and title. The dataset is divided into one training set, containing 4949 samples with a total size of 1599524363.63 bytes. The download size is 1594573423 bytes. The data file path for the training set is data/train-*.
提供机构:
andreaparker



