Name: taln-ls2n/ARRContributions
Creator: taln-ls2n
Published: 2025-11-04 11:55:45
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/taln-ls2n/ARRContributions

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* - split: test_annotated path: data/test_annotated-* dataset_info: features: - name: acl_id dtype: string - name: title dtype: string - name: abstract dtype: string - name: conference_name dtype: string - name: conference_track dtype: string - name: year dtype: int64 - name: url dtype: string - name: contribution_types sequence: string - name: openreview_id dtype: string - name: openreview_cycle dtype: string - name: openreview_history list: - name: contribution_types sequence: string - name: contribution_types_has_changed dtype: bool - name: cycle dtype: string - name: id dtype: string - name: article_content dtype: string splits: - name: train num_bytes: 110792374 num_examples: 1621 - name: validation num_bytes: 15470469 num_examples: 222 - name: test num_bytes: 13985449 num_examples: 207 - name: test_annotated num_bytes: 13984522 num_examples: 207 download_size: 75156873 dataset_size: 154232814 --- # ARRContributions: A Dataset of Contribution Types from ARR Papers ## About ARRContributions is a dataset of more than 2000 articles extracted from ARR papers submitted to [OpenReview](https://openreview.net/group?id=aclweb.org/ACL/ARR) that present contribution types information. [Contributions types](https://aclrollingreview.org/cfp#scope-of-submissions) are required to be specified by the authors when making submission to ARR. The ARR typology [(Rogers et al., 2023)](https://aclanthology.org/2023.acl-long.911/) defines 11 contribution types that authors can select from to best characterize their work: (1) NLP engineering experiment (e.g., methods improving state-of-the-art results), (2) approaches for low-compute settings and efficiency, (3) approaches for low-resource settings, (4) data resources, (5) data analysis, (6) model analysis and interpretability, (7) reproduction studies, (8) position papers, (9) surveys, (10) theory, and (11) publicly available software and pre-trained models. ## Content The following data fields are available : | **Feature** | **Type** | **Description** | | -------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `acl_id` | `string` | Unique identifier of the paper in the ACL Anthology. | | `title` | `string` | Title of the paper. | | `abstract` | `string` | Abstract of the paper. | | `conference_name` | `string` | Name of the conference (e.g., *acl*, *emnlp*, *eacl*). | | `conference_track` | `string` | Track or submission category within the conference. | | `year` | `int64` | Year of publication. | | `url` | `string` | ACL Anthology link to the paper. | | `contribution_types` | `list[string]` | List of contribution types selected according to the ARR typology (Rogers et al., 2023), e.g., *data resources*, *model analysis*, *theory*. | | `openreview_id` | `string` | Unique OpenReview submission ID. | | `openreview_cycle` | `string` | Review cycle or round associated with the OpenReview submission. | | `openreview_history` | `list[object]` | List of previous submission records for the same paper when available. Each record includes: <br>• `contribution_types` (`list[string]`): Contribution types selected in that cycle. <br>• `contribution_types_has_changed` (`bool`): Whether the contribution types differ from the previous cycle. <br>• `cycle` (`string`): The OpenReview cycle name. <br>• `id` (`string`): The OpenReview submission ID. | | `article_content` | `string` | Full text of the paper (extracted using [nougat](https://github.com/facebookresearch/nougat)). | We split our dataset into training, validation, and test sets using an 80-10-10 ratio, ensuring label balance through multi-label stratification strategy. The test set was manually annotated by three independent annotators to establish an additional gold-standard labeling. We provide both the original test annotations from the dataset authors and the consensus annotations from the three annotators as separate splits. ## Licence **Dataset:** CC BY-NC 4.0 **Original papers:** CC BY 4.0 (retain attribution) If you use this dataset: - You may use, share, and adapt the dataset for **non-commercial research or educational purposes only**. - Must attribute both the dataset creators and the original ACL Anthology authors for any content used. ## Citation ``` @misc{, title={}, author={}, year={}, eprint={}, archivePrefix={}, primaryClass={} } ```

应用场景：