Slovene-English parallel corpus slenWaC 1.0
收藏SSH Open MarketPlace2023-10-17 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/qjQ0VN
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains texts crawled from top-level Slovenian .si domains. The corpus was built with [Spidextor](https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 67% and on the word level around 68%.
The corpus is available for download from the CLARIN.SI repository.
创建时间:
2023-10-17



