MLRS/mapa_maltese
收藏MAPA Maltese
概述
- 任务类别: 命名实体识别
- 语言: 马耳他语
- 数据集名称: MAPA Maltese
- 数据集大小: 1K<n<10K
- 许可证: CC BY 4.0
数据集修正
- 手动修正了Level 1和Level 2标签之间的一些不一致性。
- 手动添加了一些标记为实体但未带有标签的跨度的标签。
- 手动修正了与分词相关的标记错误的跨度。
- 使用MLRS Tokeniser重新分词,主要目的是不将
-和``字符作为单独的标记。
数据分割
- 对于
EurLex文档,保持与joelniklaus/mapa相同的训练/验证/测试分割。 - 对于其他领域,文档按相似比例分割。
引用
-
原始数据集: bibtex @inproceedings{gianola-2020-mapa, author = {Lucie Gianola and Ēriks Ajausks and Victoria Arranz and Chomicha Bendahman and Laurent Bié and Claudia Borg and Aleix Cerdà and Khalid Choukri and Montse Cuadros and Ona de Gibert and Hans Degroote and Elena Edelman and Thierry Etchegoyhen and Ángela Franco Torres and Mercedes García Hernandez and Aitor García Pablos and Albert Gatt and Cyril Grouin and Manuel Herranz and Alejandro Adolfo Kohan and Thomas Lavergne and Maite Melero and Patrick Paroubek and Mickaël Rigault and Mike Rosner and Roberts Rozis and Lonneke van der Plas and Rinalds Vīksna and Pierre Zweigenbaum}, title = {Automatic Removal of Identifying Information in Official EU Languages for Public Administrations: The {MAPA} Project}, booktitle = {Proceedings of the 33rd International Conference on Legal Knowledge and Information Systems ({JURIX20})}, pages = {223--226}, year = {2020}, publisher = {IOS Press}, url = {https://ebooks.iospress.nl/volumearticle/56182}, doi = {10.3233/FAIA200869}, }
-
修正与分割: bibtex @misc{micallef-etal-2024-maltese-etymology, title = "Cross-Lingual Transfer from Related Languages: Treating Low-Resource {M}altese as Multilingual Code-Switching", author = "Micallef, Kurt and Habash, Nizar and Borg, Claudia and Eryani, Fadhl and Bouamor, Houda", editor = "Graham, Yvette and Purver, Matthew", booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)", month = mar, year = "2024", address = "St. Julian{}s, Malta", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.eacl-long.61", pages = "1014--1025", }



