Chinese-English Parallel Sentences Extracted from Patents

Name: Chinese-English Parallel Sentences Extracted from Patents
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:29:19
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2016T22

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>Chinese-English Parallel Sentences Extracted from Patents was developed by Chilin (HK) Limited and contains 500,000 sentence pairs of Chinese-English parallel text. This resource is based on the training corpus and test sets developed for the Tokyo-based <a href="http://research.nii.ac.jp/ntcir/index-en.html">NTCIR</a> 2009 & 2010 tasks on <a href="http://ntcir.nii.ac.jp/PatentMTList/">Patent Machine Translation</a>.</p><br> <h3>Data</h3><br> <p>The sentences in this release were selected from a larger corpus of than 300,000 Chinese-English parallel patents in different fields according to a number of filtering parameters including word alignment, sentence length and language modeling. They were then automatically segmented and aligned. All text is encoded as UTF-8.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2016T22.cmn.txt">Chinese sample</a> and <a href="desc/addenda/LDC2016T22.eng.txt">English sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p><br> <h3>Pricing</h3><br> <p>Not-for-profit organizations may license this data set for US$25.00 under the LDC Not-for-Profit Membership Agreement or under the LDC User Agreement for Non-Members for use in linguistic research, education and non-commercial technology development. For-profit organizations may license this data for US$5000, discounted to US$4000 for LDC for-profit members, under the Commercial License Agreement for Chinese-English Parallel Sentences Extracted from Patents (LDC2016T22).</p><br> <p>Current fees in this catalog entry reflect those pertaining to a for-profit organization license. Not-for-profit organizations should contact LDC's <a href="mailto:ldc@ldc.upenn.edu">Membership Office</a> to license this data set.</p></br> Portions © 2016 Chilin (HK) Limited, © 2016 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集