r-three/tokenization_robustness
收藏Hugging Face2025-06-27 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/r-three/tokenization_robustness
下载链接
链接失效反馈官方服务:
资源简介:
Tokenization Robustness数据集是一个全面的评估数据集,用于测试不同分词策略的鲁棒性。该数据集包含有多个选项的问题,旨在测试分词处理的各个方面。数据集是多种语言的,具有不同的配置,侧重于不同的分词挑战。README文件还提供了关于数据集结构、创建以及潜在的偏差或限制的详细信息。
The Tokenization Robustness dataset is a comprehensive evaluation dataset designed to test the robustness of various tokenization strategies. It includes multiple-choice questions with metadata related to tokenization types and categories. The dataset is multilingual, featuring various configurations focusing on different aspects of tokenization challenges. The README file also provides detailed information about the datasets structure, creation, and potential biases or limitations.
提供机构:
r-three



