English-Persian parallel Corpus
收藏catalogue.elra.info2017-07-03 更新2025-03-22 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-W0051/
下载链接
链接失效反馈官方服务:
资源简介:
Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created with SQL Server, but it is presented in access file type. The texts in the corpus include a variety of text types, wich are distributed as follows:- Art: 1804 entries (3.61%)- Culture: 5097 entries (10.19%)- Idiom: 435 entries (0.87%)- Law: 2266 entries (4.53%)- Literature: 11470 entries (22.93%)- Medicine: 1089 entries (2.18%)- Others: 16989 entries (33.96%)- Poetry: 692 entries (1.38%)- Politics: 5493 entries (10.98%)- Proverb: 292 entries (0.58%)- Religion: 686 entries (1.37%)- Science: 3708 entries (7.41%)
请参阅ELRA-W0118以获取本语料库的最新版本。本版语料库包含约350万词汇,英语和波斯语(法尔西语)词汇在句子层面进行了对齐(约10万句子,分布在50,021条条目中)。文件格式为Unicode。该语料库最初由SQL Server创建,但以Access文件格式呈现。语料库中的文本类型丰富多样,具体分布如下:艺术类1804条(占比3.61%)、文化类5097条(占比10.19%)、成语类435条(占比0.87%)、法律类2266条(占比4.53%)、文学类11470条(占比22.93%)、医学类1089条(占比2.18%)、其他类16989条(占比33.96%)、诗歌类692条(占比1.38%)、政治类5493条(占比10.98%)、谚语类292条(占比0.58%)、宗教类686条(占比1.37%)、科学类3708条(占比7.41%)。
提供机构:
ELRA Catalogue of Language Resources



