WiLI-2018
收藏Papers with Code2024-05-15 收录
下载链接:
https://paperswithcode.com/dataset/wili-2018
下载链接
链接失效反馈资源简介:
WiLI-2018 is a benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from Wikipedia. It contains 1000 paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in one dominant language, it has to be decided which language it is.



