krutrim-ai-labs/MUTANT
收藏Hugging Face2026-04-26 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/krutrim-ai-labs/MUTANT
下载链接
链接失效反馈官方服务:
资源简介:
MUTANT是一个用于多语言分词器评估的数据集,包含22种印度语言、英语和代码的评估集。该数据集旨在评估分词器在印度语言用例中的行为。数据集提供了JSON格式的直接读取方式,也可以通过`datasets`库加载。数据集的具体配置和特征包括不同语言的测试集大小、下载大小和示例数量。
MUTANT is a dataset for multilingual tokenizer evaluation, spanning 22 Indic languages, English, and code. The dataset is designed to assess tokenizer behavior in Indic use cases. It can be read directly in JSON format or loaded using the `datasets` library. The dataset includes detailed configurations and features such as test set size, download size, and number of examples for each language.
提供机构:
krutrim-ai-labs



