Croatian-English parallel corpus hrenWaC 2.0
收藏SSH Open MarketPlace2025-01-30 更新2025-02-01 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/JSuCk5
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains texts crawled from top-level Croatian .hr domains.
The corpus was built with [Spidextor](https://github.com/abumatran/spidextor), a tool that glues together the output of [SpiderLing](http://corpus.tools/wiki/SpiderLing) used for crawling and [Bitextor](https://github.com/bitextor/bitextor) used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 80% and on the word level around 84%.
The corpus is available for download from the CLARIN.SI repository.
创建时间:
2025-01-30



