SymDef
收藏arXiv2023-05-24 更新2024-06-21 收录
下载链接:
https://github.com/minnesotanlp/taddex
下载链接
链接失效反馈官方服务:
资源简介:
SymDef是由明尼苏达大学创建的一个英语数据集,包含5,927个来自科学论文全文的句子,每个句子都标注了所有数学符号及其对应定义。该数据集专注于复杂的协调结构,如'respectively'构造,这些构造通常包含重叠的定义跨度。SymDef旨在为训练智能阅读界面提供资源,以高精度识别学术文档中的符号定义,从而使科学写作更易于阅读。数据集创建过程中,通过手动检查和专家标注确保了数据的质量。SymDef的应用领域包括改进学术阅读界面和学术信息提取,特别是在处理数学符号定义提取方面。
SymDef is an English dataset created by the University of Minnesota. It includes 5,927 sentences extracted from full-text scientific papers, where every sentence is annotated with all mathematical symbols and their respective definitions. This dataset focuses on complex coordination structures such as the "respectively" construction, which often contain overlapping definition spans. SymDef is designed to serve as a resource for training intelligent reading interfaces to identify symbol definitions in academic documents with high accuracy, thus making scientific writing more readable. During the dataset development process, manual inspection and expert annotation were adopted to ensure data quality. The application domains of SymDef include improving academic reading interfaces and academic information extraction, particularly in the extraction of mathematical symbol definitions.
提供机构:
明尼苏达大学
创建时间:
2023-05-24



