Symlink
收藏arXiv2022-04-26 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2204.12070v1
下载链接
链接失效反馈官方服务:
资源简介:
Symlink数据集是由俄勒冈大学计算机与信息科学系创建,专注于从科学文档中提取符号及其描述的大规模数据集。该数据集涵盖了计算机科学、生物学、物理学、数学和经济学的5个不同领域,包含101篇论文,总计5,719个句子,330K个tokens。数据集的创建过程涉及从arXiv.org获取LaTeX版本的科学文章,并通过精心设计的注释流程进行标注。Symlink数据集的应用领域主要在于解决科学文献自动阅读理解中的符号与描述链接问题,旨在提高机器对科学概念及其数学表达的理解能力。
The Symlink dataset was developed by the Department of Computer and Information Science at the University of Oregon. It is a large-scale dataset focused on extracting symbols and their corresponding descriptions from scientific documents. The dataset covers five distinct fields: computer science, biology, physics, mathematics, and economics, and includes 101 academic papers, totaling 5,719 sentences and 330K tokens. The construction of the dataset involved acquiring LaTeX-formatted scientific articles from arXiv.org and annotating them via a meticulously designed annotation pipeline. The main application scope of the Symlink dataset is to address the problem of linking symbols and their descriptions in automatic reading comprehension of scientific literature, with the goal of improving machines' ability to comprehend scientific concepts and their mathematical expressions.
提供机构:
俄勒冈大学计算机与信息科学系
创建时间:
2022-04-26



