Chinese-English Parallel Sentences Extracted from Patents
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2016T22
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>Chinese-English Parallel Sentences Extracted from Patents was developed by Chilin (HK) Limited and contains 500,000 sentence pairs of Chinese-English parallel text. This resource is based on the training corpus and test sets developed for the Tokyo-based <a href="http://research.nii.ac.jp/ntcir/index-en.html">NTCIR</a> 2009 & 2010 tasks on <a href="http://ntcir.nii.ac.jp/PatentMTList/">Patent Machine Translation</a>.</p><br>
<h3>Data</h3><br>
<p>The sentences in this release were selected from a larger corpus of than 300,000 Chinese-English parallel patents in different fields according to a number of filtering parameters including word alignment, sentence length and language modeling. They were then automatically segmented and aligned. All text is encoded as UTF-8.</p><br>
<h3>Samples</h3><br>
<p>Please view this <a href="desc/addenda/LDC2016T22.cmn.txt">Chinese sample</a> and <a href="desc/addenda/LDC2016T22.eng.txt">English sample</a>.</p><br>
<h3>Updates</h3><br>
<p>None at this time.</p><br>
<h3>Pricing</h3><br>
<p>Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$5000, discounted to US$4000 for LDC for-profit members, under the Commercial License Agreement for Chinese-English Parallel Sentences Extracted from Patents (LDC2016T22).</p><br>
<p>Current fees in this catalog entry reflect those pertaining to a for-profit organization license. Not-for-profit organizations should contact LDC's <a href="mailto:ldc@ldc.upenn.edu">Membership Office</a> to license this data set.</p></br>
Portions © 2016 Chilin (HK) Limited, © 2016 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



