five

AI-assisted classification of academic and non-academic monograph titles using DeepSeek large language models.

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/8fyz5fkrv5
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 765 monograph titles used to evaluate a large language model for evidence-based withdrawal decisions in an academic library context. It combines 384 scholarly monograph titles sampled from a Scopus title list (Dataset A) and 381 popular book titles sampled from the “Best Books Ever” Goodreads-derived dataset on Zenodo (Dataset B). ​ For each record, the dataset includes bibliographic metadata such as title, publication year, publisher, identifiers (e.g. ISBN where available), language/genre (for the Goodreads subset), and a binary label indicating whether the title is considered academically oriented (academic vs non-academic). The academic/non-academic labels correspond to the ground truth used in the study: Scopus-derived titles are labeled as academic, and Goodreads-derived titles as non-academic. ​ The data were originally retrieved from Scopus (for scholarly monographs) and from the publicly available “Best Monographs Ever” dataset on Zenodo (Goodreads-derived records), and then processed to create a balanced sample suitable for classification experiments. Simple random sampling was applied to each source population to obtain 384 academic and 381 non-academic titles, providing estimates with approximately 95% confidence and a 5% margin of error for proportion-based analyses. ​ This dataset supports replication of the title-based classification experiments reported in the article, including calculation of precision, recall, F1-score, and confusion matrices for different DeepSeek models. It can also be reused for developing or benchmarking other text classification models in collection management, academic vs popular book identification, and related library analytics tasks.
创建时间:
2026-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作