mlsftwrs/mlnoisy
收藏Hugging Face2025-03-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mlsftwrs/mlnoisy
下载链接
链接失效反馈官方服务:
资源简介:
ml.Datx - Noisy数据集包含六种马利语的文本,应用了各种噪声函数。该数据集旨在与Hugging Face的NLP工具和模型一起使用。数据集中的文本经过了字符交换、随机插入空格、随机字符删除、随机单词位置交换和单词遮蔽等噪声处理。数据集分为不同的配置,每个数据实例包含原始文本、插入空格的文本、删除字符的文本、交换字符的文本、交换单词的文本、删除单词的文本、遮蔽单词的文本和清理后的文本。
The ml.Datx - Noisy dataset contains texts in six Malian languages with various noise functions applied. It is designed for use with Hugging Faces NLP tools and models. The texts in the dataset have undergone noise processing such as character swapping, random space insertion, random character deletion, random word position swapping, and word masking. The dataset is structured with different configurations, and each data instance includes the original text, text with space insertion, text with character deletion, text with character swapping, text with word swapping, text with word deletion, text with word masking, and cleaned text.
提供机构:
mlsftwrs



