LMoffett/ad-word
收藏Hugging Face2025-01-26 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/LMoffett/ad-word
下载链接
链接失效反馈官方服务:
资源简介:
Ad-Word数据集包含使用9种不同攻击策略创建的对抗性单词扰动,分为三个类别:音标攻击、打字错误攻击和视觉攻击。该数据集从Trillion Word语料库中最常见的10,000个单词(排除少于4个字符的单词)构建基词汇,并通过添加不常见的英语单词和带有变音符号的常见英语借词进行增强。数据集分为训练集、验证集和测试集,每个集合都包含来自所有攻击策略的独立单词扰动集。
The Ad-Word dataset includes adversarial word perturbations created using 9 different attack strategies, organized into three classes: phonetic, typo, and visual attacks. The dataset is constructed from the most frequent 10,000 words in the Trillion Word Corpus, augmented with uncommon English words and commonly stylized borrowed words. It is split into training, validation, and test sets, each containing unique clean-perturbed word pairs from all attack strategies.
提供机构:
LMoffett



