Serbian-English parallel corpus srenWaC 1.0
收藏SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/n0lq0K
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains texts crawled from top-level Serbian .rs domains. The corpus was built with [Spidextor](https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the sentence level and 76% on the word level.
The corpus is available for download from the CLARIN.SI repository.
创建时间:
2025-07-04



