five

BenchLS: A Reliable Dataset for Lexical Simplification

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2552392
下载链接
链接失效反馈
官方服务:
资源简介:
To create our dataset we combined two resources: the LexMTurk (Horn et al., 2014) and LSeval (De Belder and Moens, 2012) datasets. The instances in both datasets, 929 in total, contain a sentence, a target complex word, and several candidate substitutions ranked according to their simplicity. The candidates in both datasets were suggested and ranked by English speakers from the U.S. To increase its reliability, we applied the following corrections over each instance of our dataset: Spelling Filtering: We discard any misspelled can- didates using Norvig’s algorithm. We trained our spelling model over the News Crawl corpus. Inflection Correction: We inflected all candidates to the tense of the target word using the Text Adorning module of LEXenstein (Paetzold and Specia, 2015; Burns, 2013). The resulting dataset – BenchLS – contains 929 instances, with an average of 7.37 candidate substitutions per complex word.
创建时间:
2020-01-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作