Wiki Benchmark (Wiki-BM) 和 Contract Benchmark (Cont-BM)
收藏arXiv2020-12-12 更新2024-06-21 收录
下载链接:
https://github.com/System-T/TextSimplification
下载链接
链接失效反馈官方服务:
资源简介:
Wiki Benchmark (Wiki-BM) 和 Contract Benchmark (Cont-BM) 是由宾夕法尼亚大学和IBM Research合作创建的两个数据集,用于评估文本简化任务中的Split and Rephrase性能。Wiki-BM包含从Wikipedia中随机选择的500个复杂句子,而Cont-BM则从法律文档中收集了500个句子。这两个数据集都经过严格的质量控制,确保每个复杂句子都有多个高质量的简化版本。数据集的应用领域包括提高机器翻译和信息提取系统的性能,以及帮助非母语读者更好地理解复杂文本。
Wiki Benchmark (Wiki-BM) and Contract Benchmark (Cont-BM) are two datasets jointly developed by the University of Pennsylvania and IBM Research, designed to evaluate the Split and Rephrase performance in text simplification tasks. Wiki-BM includes 500 complex sentences randomly selected from Wikipedia, while Cont-BM collects 500 sentences from legal documents. Both datasets have undergone strict quality control to ensure that each complex sentence is paired with multiple high-quality simplified versions. The application domains of these datasets cover enhancing the performance of machine translation and information extraction systems, as well as assisting non-native readers to better comprehend complex texts.
提供机构:
宾夕法尼亚大学
创建时间:
2020-09-18



