AI-assisted classification of academic and non-academic monograph titles using DeepSeek large language models.
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/8fyz5fkrv5
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 765 monograph titles used to evaluate a large language model for evidence-based withdrawal decisions in an academic library context. It combines 384 scholarly monograph titles sampled from a Scopus title list (Dataset A) and 381 popular book titles sampled from the “Best Books Ever” Goodreads-derived dataset on Zenodo (Dataset B).
For each record, the dataset includes bibliographic metadata such as title, publication year, publisher, identifiers (e.g. ISBN where available), language/genre (for the Goodreads subset), and a binary label indicating whether the title is considered academically oriented (academic vs non-academic). The academic/non-academic labels correspond to the ground truth used in the study: Scopus-derived titles are labeled as academic, and Goodreads-derived titles as non-academic.
The data were originally retrieved from Scopus (for scholarly monographs) and from the publicly available “Best Monographs Ever” dataset on Zenodo (Goodreads-derived records), and then processed to create a balanced sample suitable for classification experiments. Simple random sampling was applied to each source population to obtain 384 academic and 381 non-academic titles, providing estimates with approximately 95% confidence and a 5% margin of error for proportion-based analyses.
This dataset supports replication of the title-based classification experiments reported in the article, including calculation of precision, recall, F1-score, and confusion matrices for different DeepSeek models. It can also be reused for developing or benchmarking other text classification models in collection management, academic vs popular book identification, and related library analytics tasks.
创建时间:
2026-01-21



