mediabiasgroup/DefSim
收藏Hugging Face2026-02-06 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/mediabiasgroup/DefSim
下载链接
链接失效反馈官方服务:
资源简介:
DefSim数据集包含60个定义对,每个对由三个标注者标注了定义相似性,并带有源论文标识符。每行包括左右定义、最小化的上下文摘录和相似性标签。该数据集在我们的论文中介绍和描述(正在审阅中)。与DefExtra不同,DefSim包含短摘录,因为大约一半的对是模型生成的输出,无法仅从PDF中重建。我们保持上下文跨度最小化,以减少受版权保护文本的重新分发,并避免需要用户提供的PDF。
DefSim contains 60 definition pairs labeled (by 3 annotators) for definition similarity, with source paper identifiers. Each row includes a left and right definition, minimized context excerpts, and a similarity label. The dataset is introduced and described in our paper (*under review*). Unlike DefExtra, which is released without paper excerpts and uses hydration scripts, DefSim includes short excerpts because about half of the pairs are model-generated outputs, so they cannot be reconstructed from PDFs alone. We keep context spans minimal to reduce redistribution of copyrighted text and avoid requiring user-supplied PDFs.
提供机构:
mediabiasgroup



