five

GeFRePaC - German French Reciprocal Parallel Corpus

收藏
DataCite Commons2022-06-01 更新2025-04-15 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/909
下载链接
链接失效反馈
资源简介:
The German-French Reciprocal Parallel Corpus (GeFRePaC)<p> was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik,<p> Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework<p> of the European Commission project LRsP&P (Language Resources Production &<p> Packaging - LE4-8335). The German-French Reciprocal Parallel Corpus (GeFRePaC) is a<p> 30 million word corpus (15 million for each language) for the purpose of developing,<p> enhancing and improving translation aids (dictionaries, lexicons, platforms) for<p> French-German and German-French translation. The database consists of the following<p> parallel corpora: European Union CELEX Database: Treaties, Foreign relations, Law,<p> Complementar Law and all the published documents of the "European Parliament".<p> Celex-Database: 22,000,000 words (German+French) Europarl: 8,320,000 words<p> (German+French) It covers natural general language as used in public socio-political<p> discourse and it has a focus on multilingual administration and commercial and legal<p> documentation. GeFRePaC comprises a large variety of text types for which there is a<p> rapidly growing need for translation but which currently defy successful machine<p> translation. The corpus is encoded according to the PAROLE guidelines, it was<p> aligned on the sentence level and also for single word translation units on the<p> lexical level, POS-tagged in conformity with EAGLES recommendations and validated<p> according to the most current version of the ELRA guidelines. The parallel<p> German-French texts were aligned using a program developed at the Equipe Langue et<p> Dialogue, Laboratoire Loria, Nancy. The text files containing markup for paragraphs<p> and sentences were processed by the Tree Tagger developed at the IMS Stuttgart. The<p> text files are automatically converted into TEI-conformant SGML<p> format.
提供机构:
ELG
创建时间:
2022-06-01
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作