five

taln-ls2n/ACL-rlg

收藏
Hugging Face2026-02-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/taln-ls2n/ACL-rlg
下载链接
链接失效反馈
官方服务:
资源简介:
# ACL-rlg: A Dataset for Reading List Generation # About ACL-rlg is the largest dataset of expert-crafted reading lists, containing 85 reading lists manually extracted from tutorial papers submitted to ACL-related conferences between 2020 and 2024. Data was sourced from [ACL Anthology](https://aclanthology.org/) and cross-referenced with [Semantic Scholar](https://www.semanticscholar.org/), enabling the extraction of metadata for articles beyond the ACL collection. # Content The following data fields are available : | **Field** | **Type** | **Description** | | -------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `id` | `string` | Unique identifier of the tutorial paper in the ACL Anthology. | | `title` | `string` | Title of the tutorial paper. | | `abstract` | `string` | Abstract of the tutorial paper. | | | `year` | `int64` | Year of publication. | | `url` | `string` | ACL Anthology link to the paper. | `venues` | `string` | Name of the venues the tutorial paper is published in. | | | `reading_list` | `list[object]` | Reading list provided by the authors of the paper. Each record includes: <br>• `corpusid` (`int64`): Semantic Scholar corpus ID. <br>• `paperId` (`string`): Semantic Scholar paper ID. <br>• `title` (`string`): Title of the referenced paper. <br>• `abstract` (`string`): Abstract of the referenced paper. <br>• `authors` (`list[object]`): Informations about referenced paper's authors. <br>• `venue` (`string`): Name of the venue the referenced paper is published in. <br>• `year` (`int64`): Year of publication of the referenced paper. <br>• `in_acl` (`bool`): Boolean indicating if the referenced is referenced in ACL Anthology. <br>• `citationCount` (`int64`): Citation count of the paper extracted from Semantic Scholar API. <br>• `section` (`string`): Name of the section of the reading list the referenced paper is listed in. <br>• `subsection` (`string`): Name of the subsection of the reading list the referenced paper is listed in.| ## Licence Dataset: CC BY-NC 4.0 If you use this dataset you may use, share, and adapt the dataset for non-commercial research or educational purposes only. ## Citation ``` @inproceedings{aubert-beduchaud-etal-2025-acl, title = "{ACL}-rlg: A Dataset for Reading List Generation", author = "Aubert-B{\'e}duchaud, Julien and Boudin, Florian and Daille, B{\'e}atrice and Dufour, Richard", editor = "Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Eugenio, Barbara Di and Schockaert, Steven", booktitle = "Proceedings of the 31st International Conference on Computational Linguistics", month = jan, year = "2025", address = "Abu Dhabi, UAE", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.coling-main.327/", pages = "4910--4919", abstract = "Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination." } ``` Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, and Richard Dufour. 2025. [ACL-rlg: A Dataset for Reading List Generation.](https://aclanthology.org/2025.coling-main.327/) In Proceedings of the 31st International Conference on Computational Linguistics, pages 4910–4919, Abu Dhabi, UAE. Association for Computational Linguistics.
提供机构:
taln-ls2n
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作