SummBank 1.0
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2003T16
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>SummBank 1.0 contains the data created for the Summer 2001 Johns Hopkins University Workshop which focused on text summarization in a cross-lingual information retrieval framework. The goal was to gather a corpus of original documents and summaries for use as gold standards by the documents summarization community.</p><br>
<p>The source of the data consists of 18,147 aligned bilingual (Cantonese and English) article pairs from the Information Services Department of the Hong-Kong Special Administrative Region of the People's Republic of China, which were published by the LDC in 2000 as <a href="http://catalog.ldc.upenn.edu/LDC2000T46" rel="nofollow">Hong Kong News Parallel Text</a>.</p><br>
<h3>Data</h3><br>
<p>This release contains 40 news clusters in English and Chinese, 360 multi-document, human-written non-extractive summaries, and nearly two million single document and multi-document extracts created by automatic and manual methods. MEAD was the summarizer that was reimplemented and upgraded during the workshop; versions of the software are available from the <a href="http://www.summarization.com/mead" rel="nofollow">MEAD website</a>.</p><br>
<p>This distribution includes roughly two million text files, totalling approximately 13GB uncompressed. The text files are encoded either as utf-8 for English or GB or Big-5 for Chinese.</p><br>
<h3>Updates</h3><br>
<p>Additional information, updates, bug fixes may be available on the <a href="http://www.summarization.com/summbank" rel="nofollow">SummBank website</a>.</p></br>
Portions © 1997-2000 The Government of the Hong Kong Special Administrative Region (HKSAR), © 2000, 2003 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



