pkavumba/balanced-copa
收藏数据集卡片:Balanced COPA
数据集描述
数据集摘要
Bala-COPA:一个用于训练鲁棒常识因果推理模型的英语数据集。
Balanced Choice of Plausible Alternatives数据集是一个用于训练机器学习模型的基准,这些模型对表面线索/虚假相关性具有鲁棒性。该数据集扩展了COPA数据集(Roemmele et al. 2011),通过添加镜像实例来缓解原始COPA答案中的令牌级表面线索。原始COPA数据集中的表面线索是由于正确和错误答案选择之间的令牌分布不平衡,即某些令牌在正确选择中比错误选择中出现得更频繁。Balanced COPA通过添加具有相同答案选择但不同标签的镜像实例来均衡令牌分布。
支持的任务和排行榜
语言
- 英语
数据集结构
数据实例
一个validation示例如下: json { "id": 1, "premise": "My body cast a shadow over the grass.", "choice1": "The sun was rising.", "choice2": "The grass was cut.", "question": "cause", "label": 1, "mirrored": false, }
数据字段
所有拆分的数据字段相同:
premise:一个string特征。choice1:一个string特征。choice2:一个string特征。question:一个string特征。label:一个int32特征。id:一个int32特征。mirrored:一个bool特征。
数据拆分
| validation | test |
|---|---|
| 1,000 | 500 |
数据集创建
策划理由
源数据
初始数据收集和规范化
源语言生产者是谁?
注释
注释过程
注释者是谁?
个人和敏感信息
使用数据的注意事项
数据集的社会影响
偏见的讨论
其他已知限制
附加信息
数据集策展人
许可信息
Creative Commons Attribution 4.0 International (CC BY 4.0)
引用信息
bibtex @inproceedings{kavumba-etal-2019-choosing, title = "When Choosing Plausible Alternatives, Clever Hans can be Clever", author = "Kavumba, Pride and Inoue, Naoya and Heinzerling, Benjamin and Singh, Keshav and Reisert, Paul and Inui, Kentaro", booktitle = "Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D19-6004", doi = "10.18653/v1/D19-6004", pages = "33--42", abstract = "Pretrained language models, such as BERT and RoBERTa, have shown large improvements in the commonsense reasoning benchmark COPA. However, recent work found that many improvements in benchmarks of natural language understanding are not due to models learning the task, but due to their increasing ability to exploit superficial cues, such as tokens that occur more often in the correct answer than the wrong one. Are BERT{}s and RoBERTa{}s good performance on COPA also caused by this? We find superficial cues in COPA, as well as evidence that BERT exploits these cues.To remedy this problem, we introduce Balanced COPA, an extension of COPA that does not suffer from easy-to-exploit single token cues. We analyze BERT{}s and RoBERTa{}s performance on original and Balanced COPA, finding that BERT relies on superficial cues when they are present, but still achieves comparable performance once they are made ineffective, suggesting that BERT learns the task to a certain degree when forced to. In contrast, RoBERTa does not appear to rely on superficial cues.", }
@inproceedings{roemmele2011choice, title={Choice of plausible alternatives: An evaluation of commonsense causal reasoning}, author={Roemmele, Melissa and Bejan, Cosmin Adrian and Gordon, Andrew S}, booktitle={2011 AAAI Spring Symposium Series}, year={2011}, url={https://people.ict.usc.edu/~gordon/publications/AAAI-SPRING11A.PDF}, }
贡献
感谢@pkavumba添加此数据集。




