billli/QuRe
收藏数据集概述
- 名称: QuRe
- 许可证: Apache 2.0
- 语言: 英语
- 标签:
- 自然语言处理
- 广义量词
- 量词推理
- 规模: n<1K
数据集介绍
QuRe 是一个用于量词推理的数据集,源自论文《Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models》。该数据集包含来自维基百科的真实句子以及英语使用者对广义量词的人工标注。
数据样本
json { "orig_sentence": "In order for a steel to be considered stainless it must have a Chromium content of at least 10.5%.", "percentage": "10.50%", "percentage_index": 0, "math_expr": ">=0.105", "quant_sent": "In order for a steel to be considered stainless it must have some Chromium content.", "quantifier": "some", "quantifier_position": 12, "specificity": "unable", "wiki_entity": "List of blade materials", "topics": "metallurgy; steel; composition" }
- orig_sentence: 维基百科中出现的原始句子。
- percentage: 原始句子中提到的百分比。
- percentage_index: 百分比在原始句子中的索引位置。
- math_expr: 生成的百分比表达式。
- quant_sent: 标注后的量化句子。
- quantifier_position: 量词在句子中的位置。
- specificity: 从句子中排除量词后,解析量词百分比范围的难度。
- wiki_entity: 包含原始句子的维基百科实体。
- topics: 句子的主题。
数据集加载
python from datasets import load_dataset
ds = load_dataset("billli/QuRe")
参考文献
@inproceedings{li-etal-2023-pragmatic, title = "Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models", author = "Li, Yiyuan and Menon, Rakesh and Ghosh, Sayan and Srivastava, Shashank", editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika", booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing", month = dec, year = "2023", address = "Singapore", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.emnlp-main.38", pages = "573--591", }



