3,832 annotated images of scanned Hollywood pressbooks for object detection Model
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.v41ns1s8s
下载链接
链接失效反馈官方服务:
资源简介:
From the 1910s through the 1980s, Hollywood studios promoted their movies through the creation and dissemination of pressbooks—bound pamphlets containing publicity materials, advertising layouts, accessories for sale, and other promotional tactics. These promotional booklets were sent to exhibitors and press outlets, making them vital nodes within the wider networks of film circulation and culture. This study investigates the reach of these publications by running similarity analyses between Hollywood pressbooks and Chronicling America, the Library of Congress’ expansive collection of local newspapers. High-throughput computing infrastructure and sophisticated machine vision workflows enable us to ask questions about Hollywood pressbooks, including who used them, how, and whether the publicity text, promotional photos, and ads from the pressbooks permeated American newspapers and magazines as the studios had intended.
We used machine learning techniques to identify and classify separate articles, images, and other elements from each page. A team of three data annotators spent two months reviewing 3,832 page scans to identify and classify image segments. This annotation data was augmented using standard techniques including adding in copies of the existing data that was rotated and sheared, along with rotated bounding boxes, and used to train a YOLOv11 object detection model. The machine vision model was then applied to the entire corpus of digitized pressbooks (25,854 pages). By separating the unique elements from each page, we were able to perform much more granular comparisons between pressbook contents and American newspapers. This dataset contains the original images, annotation data, and model weights.
创建时间:
2025-10-02



