five

Chinese-English Parallel Sentences Extracted from Patents

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2016T22
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Chinese-English Parallel Sentences Extracted from Patents was developed by Chilin (HK) Limited and contains 500,000 sentence pairs of Chinese-English parallel text. This resource is based on the training corpus and test sets developed for the Tokyo-based <a href="http://research.nii.ac.jp/ntcir/index-en.html">NTCIR</a> 2009 &amp; 2010 tasks on <a href="http://ntcir.nii.ac.jp/PatentMTList/">Patent Machine Translation</a>.</p><br> <h3>Data</h3><br> <p>The sentences in this release were selected from a larger corpus of than 300,000 Chinese-English parallel patents in different fields according to a number of filtering parameters including word alignment, sentence length and language modeling. They were then automatically segmented and aligned. All text is encoded as UTF-8.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2016T22.cmn.txt">Chinese sample</a> and <a href="desc/addenda/LDC2016T22.eng.txt">English sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p><br> <h3>Pricing</h3><br> <p>Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$5000, discounted to US$4000 for LDC for-profit members, under the Commercial License Agreement for Chinese-English Parallel Sentences Extracted from Patents (LDC2016T22).</p><br> <p>Current fees in this catalog entry reflect those pertaining to a for-profit organization license. Not-for-profit organizations should contact LDC's&nbsp;<a href="mailto:ldc@ldc.upenn.edu">Membership Office</a> to license this data set.</p></br> Portions © 2016 Chilin (HK) Limited, © 2016 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作