five

zbMATHOpenRec: A Gold Standard Dataset for Recommending Scientific Documents with Mathematical Content

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10432916
下载链接
链接失效反馈
官方服务:
资源简介:
Here we include the first gold standard dataset for recommending scientific documents with mathematical content.  Contents:  As of Feb-2023, there are 421 recommendation pairs with 80 seed documents. All recommendation pairs are available: recommendationPairs.csv Each document's contents, such as title, abstract/review/summary, authors, MSC codes, Full-text link, references, etc. are available in: documentContents.csv Dataset construction process: This is the first gold standard content-based RS dataset, consisting of 421 scientific research entry recommendation pairs with mathematical content. The purpose is to enable math in scientific documents for document recommendations, meaning if two documents have similar math content, one could be recommended to the other.  To create this dataset, we analyzed 4.5 million research entires from zbMATH Open (https://zbmath.org/) and performed the following steps to obtain the final dataset: We selected 80 seeds that capture the most word and math tokens in zbMATH Open using statistical measures. Three experts, one with several years of experience reviewing research entries in mathematics, curated the recommendations for 80 seeds. Using this dataset, researchers can accelerate the development and testing of recommendation approaches for scientific literature with mathematical content, improving recommendations for the STEM fields where mathematical content is currently being ignored ## License  Legal restrictions and copyright: The zbMATH Open data is subject to the Terms and Conditions for the zbMATH Open API Service of FIZ Karlsruhe – Leibniz-Institut für Informationsinfrastruktur GmbH. Content generated by zbMATH Open, such as reviews, classifications, software, or author disambiguation data, are distributed under CC-BY-SA 4.0. This defines the license for the whole dataset, which also contains non-copyrighted bibliographic metadata and reference data derived from I4OSC (CC0).
创建时间:
2024-04-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作