prli/wiki-full_draft_chopped-rare-qwen-alpha0-40_perturb
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/prli/wiki-full_draft_chopped-rare-qwen-alpha0-40_perturb
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含4000个样本的验证集,用于文本扰动和鲁棒性分析。每个样本包含原始文本字段(text)以及通过多种扰动方法生成的变体,包括同义词替换、butter_fingers(模拟打字错误的字符扰动)、随机删除、字符大小写变化、空格扰动和下划线技巧。数据集可用于评估NLP模型在文本噪声下的性能或数据增强研究。
This dataset consists of a validation split with 4000 samples, designed for text perturbation and robustness analysis. Each sample includes an original text field along with variants generated through multiple perturbation methods, such as synonym substitution, butter_fingers (character-level noise simulating typos), random deletion, change in character case, whitespace perturbation, and underscore trick. It can be used to evaluate NLP model performance under textual noise or for data augmentation research.
提供机构:
prli



