Datasets for "Tokenization as Structural Inductive Bias"
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20033927
下载链接
链接失效反馈官方服务:
资源简介:
Cleaned text corpora used for evaluating the StructPiece tokenizer across 9 languages.
提供机构:
Zenodo
创建时间:
2026-05-05



