five

SummBank 1.0

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2003T16
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>SummBank 1.0 contains the data created for the Summer 2001 Johns Hopkins University Workshop which focused on text summarization in a cross-lingual information retrieval framework. The goal was to gather a corpus of original documents and summaries for use as gold standards by the documents summarization community.</p><br> <p>The source of the data consists of 18,147 aligned bilingual (Cantonese and English) article pairs from the Information Services Department of the Hong-Kong Special Administrative Region of the People's Republic of China, which were published by the LDC in 2000 as <a href="http://catalog.ldc.upenn.edu/LDC2000T46" rel="nofollow">Hong Kong News Parallel Text</a>.</p><br> <h3>Data</h3><br> <p>This release contains 40 news clusters in English and Chinese, 360 multi-document, human-written non-extractive summaries, and nearly two million single document and multi-document extracts created by automatic and manual methods. MEAD was the summarizer that was reimplemented and upgraded during the workshop; versions of the software are available from the <a href="http://www.summarization.com/mead" rel="nofollow">MEAD website</a>.</p><br> <p>This distribution includes roughly two million text files, totalling approximately 13GB uncompressed. The text files are encoded either as utf-8 for English or GB or Big-5 for Chinese.</p><br> <h3>Updates</h3><br> <p>Additional information, updates, bug fixes may be available on the <a href="http://www.summarization.com/summbank" rel="nofollow">SummBank website</a>.</p></br> Portions © 1997-2000 The Government of the Hong Kong Special Administrative Region (HKSAR), © 2000, 2003 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作