taln-ls2n/ACL-rlg
收藏Hugging Face2026-02-17 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/taln-ls2n/ACL-rlg
下载链接
链接失效反馈官方服务:
资源简介:
# ACL-rlg: A Dataset for Reading List Generation
# About
ACL-rlg is the largest dataset of expert-crafted reading lists, containing 85 reading lists manually extracted from tutorial papers submitted to ACL-related conferences between 2020 and 2024.
Data was sourced from [ACL Anthology](https://aclanthology.org/) and cross-referenced with [Semantic Scholar](https://www.semanticscholar.org/), enabling the extraction of metadata for articles beyond the ACL collection.
# Content
The following data fields are available :
| **Field** | **Type** | **Description** |
| -------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id` | `string` | Unique identifier of the tutorial paper in the ACL Anthology. |
| `title` | `string` | Title of the tutorial paper. |
| `abstract` | `string` | Abstract of the tutorial paper. | |
| `year` | `int64` | Year of publication. |
| `url` | `string` | ACL Anthology link to the paper.
| `venues` | `string` | Name of the venues the tutorial paper is published in. | |
| `reading_list` | `list[object]` | Reading list provided by the authors of the paper. Each record includes: <br>• `corpusid` (`int64`): Semantic Scholar corpus ID. <br>• `paperId` (`string`): Semantic Scholar paper ID. <br>• `title` (`string`): Title of the referenced paper. <br>• `abstract` (`string`): Abstract of the referenced paper. <br>• `authors` (`list[object]`): Informations about referenced paper's authors. <br>• `venue` (`string`): Name of the venue the referenced paper is published in. <br>• `year` (`int64`): Year of publication of the referenced paper. <br>• `in_acl` (`bool`): Boolean indicating if the referenced is referenced in ACL Anthology. <br>• `citationCount` (`int64`): Citation count of the paper extracted from Semantic Scholar API. <br>• `section` (`string`): Name of the section of the reading list the referenced paper is listed in. <br>• `subsection` (`string`): Name of the subsection of the reading list the referenced paper is listed in.|
## Licence
Dataset: CC BY-NC 4.0
If you use this dataset you may use, share, and adapt the dataset for non-commercial research or educational purposes only.
## Citation
```
@inproceedings{aubert-beduchaud-etal-2025-acl,
title = "{ACL}-rlg: A Dataset for Reading List Generation",
author = "Aubert-B{\'e}duchaud, Julien and
Boudin, Florian and
Daille, B{\'e}atrice and
Dufour, Richard",
editor = "Rambow, Owen and
Wanner, Leo and
Apidianaki, Marianna and
Al-Khalifa, Hend and
Eugenio, Barbara Di and
Schockaert, Steven",
booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
month = jan,
year = "2025",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.coling-main.327/",
pages = "4910--4919",
abstract = "Familiarizing oneself with a new scientific field and its existing literature can be daunting due to the large amount of available articles. Curated lists of academic references, or reading lists, compiled by experts, offer a structured way to gain a comprehensive overview of a domain or a specific scientific challenge. In this work, we introduce ACL-rlg, the largest open expert-annotated reading list dataset. We also provide multiple baselines for evaluating reading list generation and formally define it as a retrieval task. Our qualitative study highlights that traditional scholarly search engines and indexing methods perform poorly on this task, and GPT-4o, despite showing better results, exhibits signs of potential data contamination."
}
```
Julien Aubert-Béduchaud, Florian Boudin, Béatrice Daille, and Richard Dufour. 2025. [ACL-rlg: A Dataset for Reading List Generation.](https://aclanthology.org/2025.coling-main.327/) In Proceedings of the 31st International Conference on Computational Linguistics, pages 4910–4919, Abu Dhabi, UAE. Association for Computational Linguistics.
提供机构:
taln-ls2n



