Twitter Misspellings Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://noisy-text.github.io/norm-shared-task.html
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自推特上最常出现的1051个单词及其各种拼写错误。此外,该数据集最初在2015年被用于IBM公司的一项数据标准化挑战。规模上,数据集包含1051个单词,任务是对字符串相似度进行测量。
This dataset contains 1051 most frequently occurring words from Twitter along with their various spelling errors. It was initially used in a data standardization challenge hosted by IBM in 2015. The core task of this dataset, which comprises 1051 words, is string similarity measurement.
提供机构:
IBM



