Datasets with typos for testing LEA (https://arxiv.org/abs/2307.02912)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10401845
下载链接
链接失效反馈官方服务:
资源简介:
Datasets for testing the generalization capacity of LEA in the presence of typos. Link to the paper: https://arxiv.org/abs/2307.02912
Below is a list of raw public datasets and different versions of test splits to which automatically synthetically generated typos have been added by deleting and replacing characters.
Abt-Buy
Amazon-Google
WDC-Computers (small, medium, large and xlarge)
WDC-All (xlarge)
RTE
MRPC
Reference:
Almagro, M., Almazán, E., Ortego, D., & Jiménez, D. (2023, August). LEA: Improving Sentence Similarity Robustness to Typos Using Lexical Attention Bias. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 36-46).
创建时间:
2023-12-18



