ProQuest Vogue Text-as-Data Collection
收藏DataCite Commons2024-06-26 更新2024-07-13 收录
下载链接:
https://ultraviolet.library.nyu.edu/doi/10.58153/p4gvb-msg85
下载链接
链接失效反馈官方服务:
资源简介:
The collection consists of extracted machine-readable text from the print magazine, dating from 1892-2020. In total there are 450,921 xml files, with a size of 1.67 GB. There is one .xml for each item, advertisement, article, or subsection of a magazine issue, including metadata about that item and the full text as extracted from the digitized print using optical character recognition (OCR). The collection also includes 651,896 .jpeg files totaling 579 GB. There is one .jpeg for each page of the original print. This collection is static and is not updated with more current issues, and is available to NYU faculty and students only. Instructions for how to access this collection are available at https://guides.nyu.edu/tdm/proquest-vogue-magazine
本数据集涵盖1892年至2020年间从印刷版杂志中提取得到的机器可读文本。总计包含450,921个可扩展标记语言(XML)文件,总大小为1.67吉字节(GB)。每份杂志期号的单篇内容、广告、文章或子部分均对应一个XML文件,其中包含该内容的元数据,以及通过光学字符识别(Optical Character Recognition,OCR)从数字化印刷版中提取的完整文本。本数据集还包含651,896个JPEG图像文件,总大小达579吉字节(GB),原始印刷版的每一页均对应一个JPEG文件。本数据集为静态集合,不会更新最新期刊内容,仅对纽约大学(New York University,NYU)的教职工与学生开放。该数据集的访问指南可通过以下链接获取:https://guides.nyu.edu/tdm/proquest-vogue-magazine
提供机构:
Condé Nast
创建时间:
2023-03-27



