Data for "Modular Bibliographical Profiling of Historic Book Reviews"
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10092557
下载链接
链接失效反馈官方服务:
资源简介:
This dataset supports the research paper, ""Modular Bibliographical Profiling of Historic Book Reviews." The paper examines different methods of predicting bibliographical details (e.g. author, title, and publisher) of books under review in a corpus of approximately 1,100 historical book reviews. The dataset is comprised of book reviews from ProQuest's American Periodicals Series (APS). This kind of bibliographical profiling is often characterized as a Natural Language Processing (NLP) or Named Entity Recognition (NER) task, but it can be more specifically described as a two-part Named Entity Linking (NEL) task, beginning with a feature extraction stage followed by one of several available matching or classification methods. An attempt has been made to formalize constraints for modular bibliographical profiling (MBP) and shed light on some important choices that are often glossed over or obscured by digital humanities (DH) practitioners. Applying these constraints, the paper evaluates combinations of feature selection (naive bag-of-words [BOW], rule-based feature extraction, and NER using a pre-trained model) with a standardized similarity-based matching strategy (cosine similarity). All tasks are performed on derived text data (term frequency tables), so that data can be shared and all methods can be used on materials available only in non-consumptive formats. These comparisons suggest that naive BOW can perform quite robustly, and that using even a basic pretrained NER model in conjunction with a BOW approach may reduce false positives.
创建时间:
2023-11-10



