Data and scripts from: Rubenstein Library card catalog
收藏DataCite Commons2022-11-04 更新2025-04-10 收录
下载链接:
https://idn.duke.edu/ark:/87924/r4br8v905
下载链接
链接失效反馈官方服务:
资源简介:
This data includes the dataset, code, and files used and created by the Duke University Data+ 2021 Rubenstein Library Card Catalog Team. Working with the digitized cards from the David M. Rubenstein Rare Book and Manuscript Library's physical card catalogs, our team explored the files as a way to further the library's initiative of finding and describing historically marginalized voices in their collections.
We created a structured dataset using natural language processing and some manual editing, sorted by collection of items within the catalog and containing important metadata such as author, location, and date written from the OCRed text of the scanned cards. With the dataset we created, we analyzed what and who is present in these cards, and displayed these findings in Jupyter Notebook files. We explored the demographics of the authors and items cataloged, as well as analyzed how the information within the cards relates to the history of Duke University and delved into the common topics of the data. We completed spatial frequency mapping on the level of the United States and of North Carolina counties, in addition to visualizing the international countries present in the cards. There is copious rich information present in the files, and our Data+ project is just the tip of the iceberg. We hope that future research teams will continue to dissect the card files to gain insights into Duke's history and the contents of the library's collections.
本数据集包含杜克大学(Duke University)Data+ 2021鲁宾斯坦图书馆卡片目录团队所使用及创建的数据集、代码与相关文件。团队依托大卫·M·鲁宾斯坦珍本与手稿图书馆(David M. Rubenstein Rare Book and Manuscript Library)实体卡片目录的数字化卡片开展工作,以此推进该图书馆旨在发掘并记述馆藏中历史上被边缘化群体声音的项目。
团队通过自然语言处理(Natural Language Processing,NLP)结合部分人工编辑,构建了结构化数据集;该数据集按目录内的藏品集合进行分类,包含从扫描卡片的光学字符识别(Optical Character Recognition,OCR)文本中提取的作者、馆藏地点、创作日期等关键元数据。基于构建的数据集,团队对卡片中涉及的主题与主体展开分析,并将分析结果展示于Jupyter Notebook文件中。团队对编目作者与藏品的人口统计学特征展开探究,同时分析了卡片内信息与杜克大学历史的关联,并深入挖掘了数据集的核心主题。此外,团队还完成了美国全境及北卡罗来纳州各郡县层面的空间频率制图,并对卡片中涉及的国际国家分布进行了可视化展示。该文件中蕴含着极为丰富的信息,而我们的Data+项目仅为该研究的冰山一角。我们期待未来的研究团队能够进一步剖析这批卡片文件,以深入了解杜克大学的历史与图书馆馆藏内容。
提供机构:
Duke Research Data Repository
创建时间:
2022-11-03



