3,832 annotated images of scanned Hollywood pressbooks for object detection Model
收藏DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.v41ns1s8s
下载链接
链接失效反馈官方服务:
资源简介:
From the 1910s through the 1980s, Hollywood studios promoted their movies
through the creation and dissemination of pressbooks—bound pamphlets
containing publicity materials, advertising layouts, accessories for sale,
and other promotional tactics. These promotional booklets were sent to
exhibitors and press outlets, making them vital nodes within the wider
networks of film circulation and culture. This study investigates the
reach of these publications by running similarity analyses between
Hollywood pressbooks and Chronicling America, the Library of Congress’
expansive collection of local newspapers. High-throughput computing
infrastructure and sophisticated machine vision workflows enable us to ask
questions about Hollywood pressbooks, including who used them, how, and
whether the publicity text, promotional photos, and ads from the
pressbooks permeated American newspapers and magazines as the studios had
intended. We used machine learning techniques to identify and classify
separate articles, images, and other elements from each page. A team of
three data annotators spent two months reviewing 3,832 page scans to
identify and classify image segments. This annotation data was augmented
using standard techniques including adding in copies of the existing data
that was rotated and sheared, along with rotated bounding boxes, and used
to train a YOLOv11 object detection model. The machine vision model was
then applied to the entire corpus of digitized pressbooks (25,854 pages).
By separating the unique elements from each page, we were able to perform
much more granular comparisons between pressbook contents and American
newspapers. This dataset contains the original images, annotation data,
and model weights.
提供机构:
Dryad
创建时间:
2025-10-02



