Slovene-English parallel corpus slenWaC 1.0
收藏SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/4RV6Lf
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains texts crawled from top-level Slovenian .si domains. The corpus was built with [Spidextor](https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 67% and on the word level around 68%.
The corpus is available for download from the CLARIN.SI repository.
创建时间:
2025-07-04



