Serbian-English parallel corpus srenWaC 1.0
收藏SSH Open MarketPlace2023-10-13 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/mh9BXl
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains texts crawled from top-level Serbian .rs domains. The corpus was built with [Spidextor](https://github.com/abumatran/spidextor), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext, given the evaluation results on other languages, can be estimated at 74% on the sentence level and 76% on the word level.
The corpus is available for download from the CLARIN.SI repository.
创建时间:
2023-10-13



