FrancophonIA/es-fr_website_parallel_corpus
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/es-fr_website_parallel_corpus
下载链接
链接失效反馈官方服务:
资源简介:
这是一个从多语言网站爬取的双语平行语料库,包含15,797个翻译单元(TU)。数据爬取时间跨越2016年11月15日至2017年1月23日。数据源已经经过了严格的验证过程,淘汰了不合规的、拼写错误率超过99%的、以及手动验证过程中发现的错误率超过阈值的翻译单元。
This is a parallel corpus of bilingual texts crawled from multilingual websites, containing 15,797 translation units (TUs). The crawling period spans from 15th November 2016 to 23rd January 2017. The source data has undergone a strict validation process, discarding TUs that are non-compliant, have more than 99% misspelled tokens, and those identified with high error rates during the manual validation process.
提供机构:
FrancophonIA



