five

ALEXSIS: A Dataset for Benchmarking Lexical Simplification for Spanish

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5837148
下载链接
链接失效反馈
官方服务:
资源简介:
The ALEXSIS Spanish Dataset for Lexical Simplification contains 381 instances . Each instance is composed by a sentence, a target complex word, and 25 candidate substitutions. The dataset format is similar to that of LexMturk (Horn et al., 2014) but in ALEXSIS the sentences are not tokenized.  A total of 380 instances of the 381 have only 1 appearance of the complex word in the sentence. There is only one instance with two appearances of the complex word in the sentence. This is the case of the instance in line 263. The special sentence is: "Limita al norte con el paraje Árbol Solo, al sur con el paraje San Vicente, al este con la localidad de San Andrés y al oeste con el Canal San Martín."  The complex word is "paraje". The first appearance of the complex word "paraje" was the one marked in bold for the annotators. The instances have the following format in UTF8: ... See below an instance of the dataset. ________________________________  SAMPLE INSTANCE __________________________ Sufrió una importante reducción en su capacidad para poder acogerse a las normas de la FIFA para los estadios de fútbol. acogerse adaptarse sumarse incorporarse obedecer apegarse adaptarse adaptarse ampararse ampararse adaptarse apegarse aceptar asimilarse adaptarse aplicarse aceptarse incorporarse refugiarse amparar recurrir aceptar refugiarse cumplir con adaptarse admitirse __________________________________________________________________________ The ALEXSIS Spanish Dataset for Lexical Simplification can also be found at github:  https://github.com/LaSTUS-TALN-UPF/ALEXSIS If you make use of the ALEXSIS dataset for Spanish, please cite the following paper: Daniel Ferrés and Horacio Saggion. ALEXSIS: A Dataset for Lexical Simplification in Spanish. Proceedings of the Language Resources and Evaluation Conference (LREC) 2022. link to the bibtex format file [.bib] @inproceedings{ferres-saggion@LREC2022, title = "ALEXSIS: A Dataset for Lexical Simplification in Spanish.", author = "Ferrés, Daniel and Saggion, Horacio", booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {3582--3594}, url = {https://aclanthology.org/2022.lrec-1.383} }   RELATED WORK 1) TSAR-2022 Shared Task on Lexical Simplification ALEXSIS has been used in the TSAR-2022 Shared Task on Lexical Simplification as a dataset to evaluate Lexical Simplification systems in Spanish. 12 instances were used in the trial-dataset and 368 instances were used in the test dataset. The instance with two appearances of the complex word was not used. In this evaluation the systems were evaluated with the 368 instances of the TSAR-ES test dataset. https://github.com/LaSTUS-TALN-UPF/TSAR-2022-Shared-Task 2) Experiments with ALEXSIS and similar datasets for English and Portuguese (ALEXSIS-PT) A paper describing the compilation of the TSAR-2022 Shared Task datasets for English, Portuguese (ALEXSIS-PT) and Spanish (ALEXSIS) that includes several experiments with two state-of-the-art approaches for Lexical Simplification has been published at this link: https://www.frontiersin.org/articles/10.3389/frai.2022.991242 In this paper the approaches ((LSBert (Qiang et al., 2021) adapted for Spanish and TUNER (Ferrés et al., 2017)) were evaluated with the 381 instances of the ALEXSIS dataset. Lexical Simplification Benchmarks for English, Portuguese, and Spanish. Sanja Štajner, Daniel Ferrés, Matthew Shardlow, Kai North, Marcos Zampieri and Horacio Saggion. Front. Artif. Intell. Sec. Natural Language Processing. doi: 10.3389/frai.2022.991242 REFERENCES Ferrés, D., Saggion, H., and Gómez Guinovart, X. (2017b). An adaptable lexical simplification architecture for Major Ibero-Romance languages. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems (Copenhagen: Association for Computational Linguistics), 40–47. doi: 10.18653/v1/W17-5406 Horn, C., Manduca, C., and Kauchak, D. (2014). Learning a Lexical Simplifier Using Wikipedia. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 458–463, Baltimore, Maryland, June. Association for Computational Linguistics. LexMturk dataset: https://cs.pomona.edu/~dkauchak/simplification/lex.mturk.14/lex.mturk.14.tar.gz J. Qiang, Y. Li, Y. Zhu, Y. Yuan, Y. Shi and X. Wu. LSBert: Lexical Simplification Based on BER. In IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3064-3076, 2021 doi: 10.1109/TASLP.2021.3111589.   CONTACT LaSTUS lab@TALN@UPF Daniel Ferrés - daniel.ferres[at]upf.edu Horacio Saggion - horacio.saggion[at]upf.edu   (corresponding author) ConMuTeS project Link: https://www.upf.edu/web/conmutes ACKNOWLEDGEMENTS ConMuTeS project: Context-aware Multilingual Text Simplification (ConMuTeS) PID2019-109066GB-I00/AEI/10.13039/501100011033 Ministerio de Ciencia, Innovación y Universidades (MCIU) of Spain Agencia Estatal de Investigación (AEI) of Spain
创建时间:
2022-10-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作