five

A dataset containing the table of contents of 56K ebook titles extracted from Springer

收藏
DataCite Commons2020-11-14 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/open-access/dataset-containing-table-contents-56k-ebook-titles-extracted-springer
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been created from a collection of 56403 multidisciplinary book titles from Springer, available through the Hellenic Academic Libraries Link (https://www.heal-link.gr/en/home-2/) subscription. To obtain this dataset, a parser was created for extracting relevant information, such as the title, subtitle and ToC, from each book. The extracted information was stored in a database for further processing. Each book title in the database includes information regarding the bookid, title, and ToC. As a next step, a team of librarians who were working in the NTUA Digital Library manually added the subject field information. This dataset contains the primary subject field as each book’s label. In the 5 categories sub set there is also another field that contains the secondary labels for each book in the collectionThis dataset can serve as a basis for multiclass classification problems and/or content recommendation. The 5 categories subset can also be used for multilabel classification tasks. By utilizing information from the ToC, we can better capture the topics in each book, thereby facilitating the identification of similar books. The dataset contations 2 subsets: a. 26 categories, and b. 5 general categories as detailed below:26 categories: Anthropology, Art, Computer Science, Culture, Economics, Education, Engineering, Environment, Food, History, Humanities, Law, Life Sciences, Linguistics, Literature, Management, Mathematics, Medicine, Music, Organization, Physical Sciences, Popular works, Religion, Social Sciences, Science, Transportation5: categories: Computer Science, Engineering, Mathematics, Medicine, Physics
提供机构:
IEEE DataPort
创建时间:
2020-11-14
二维码
社区交流群
二维码
科研交流群
商业服务