NRC-CNRC/Machine-Generated-Reviews-0.1

Name: NRC-CNRC/Machine-Generated-Reviews-0.1
Creator: NRC-CNRC
Published: 2026-03-12 13:09:48
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/NRC-CNRC/Machine-Generated-Reviews-0.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-sa-4.0 task_categories: - other - text-generation language: - en pretty_name: Machine Generated Reviews size_categories: - 100K<n<1M task_ids: - language-modeling - text2text-generation tags: - text - text-generation viewer: true dataset_info: features: - name: venue dtype: string - name: year dtype: int32 - name: model dtype: string - name: submission_id dtype: string - name: review_id dtype: string - name: invitation_id dtype: string - name: review dtype: string --- # Machine Generated Reviews This dataset contains the machine generated peer reviews used in the study of machine generated text (MGT) output syntactic homogenization in ["Emphasizing the Commendable": A Study of Homogenized Transitive Verb Constructions in Machine Generated Peer Reviews](https://aclanthology.org/2026.lrec-main.649). The corresponding academic research papers and official reviews are available on [OpenReview](https://openreview.net/). The machine generated peer reviews are produced by three LLMs with a diverse background. The prompts and generated text are all in English. ## Prompts The prompt used for generating LLM reviews. ``` Your task is to write a review given a paper titled {title} and the paper content is: {paper_content}. Your output should be like the following format: Summary: Strengths And Weaknesses: Summary Of The Review: ``` `{title}` is the paper's title and is available from OpenReview’s API and `{paper_content}` is the paper's content, the text extracted from the PDF file of that paper. ## Dataset Overview Each entries have the following fields: - `venue` the venue's name - `year` the venue's year - `model` the model used to generate the review - `submission_id` the submission id - `review_id` first 16 bytes of the `sha1` representation of the review - `invitation_id` the submission invitation id - `review` the machine generated review using `model` Given the following entry: ```json { "venue": "robot-learning.org/CoRL", "year": 2024, "model": "Qwen/Qwen3-4B-Instruct-2507", "submission_id": "zr2GPi3DSb", "review_id": "782088da99d7f6ce", "invitation_id": "robot-learning.org/CoRL/2024/Conference/-/Submission", "review": "**Summary:** \nThis paper presents..." } ``` you can access the human reviews by substituting `{submission_id}` in `https://openreview.net/forum?id={submission_id}`. For the previous entry, you would access the human reviews at `https://openreview.net/forum?id=zr2GPi3DSb`. Below is a summary of the machine generated peer reviews counts. Note that these numbers differ from Table 1 in [our paper](https://aclanthology.org/2026.lrec-main.649) since we are not including the human reviews as they can be found on [OpenReview](https://openreview.net/). | model | # review | | :-------------------------- | -------: | | google/gemma-3-4b-it | 41872 | | gpt-4o-2024-08-06 | 41872 | | Qwen/Qwen3-4B-Instruct-2507 | 41872 | | year | # review | | :--- | -------: | | 2018 | 2727 | | 2019 | 4125 | | 2020 | 6354 | | 2021 | 16050 | | 2022 | 15987 | | 2023 | 24402 | | 2024 | 29247 | | 2025 | 26724 | | venue | year | # review | | :---------------------- | :--- | -------: | | EMNLP | 2023 | 5739 | | ICLR.cc | 2018 | 2727 | | ICLR.cc | 2019 | 4125 | | ICLR.cc | 2020 | 6354 | | ICLR.cc | 2021 | 7341 | | ICLR.cc | 2022 | 7029 | | ICLR.cc | 2023 | 9303 | | ICLR.cc | 2024 | 19266 | | ICLR.cc | 2025 | 26724 | | NeurIPS.cc | 2021 | 8253 | | NeurIPS.cc | 2022 | 8367 | | NeurIPS.cc | 2023 | 8784 | | NeurIPS.cc | 2024 | 9216 | | robot-learning.org/CoRL | 2021 | 456 | | robot-learning.org/CoRL | 2022 | 591 | | robot-learning.org/CoRL | 2023 | 576 | | robot-learning.org/CoRL | 2024 | 765 | ## Usage examples (python) Load dataset from HuggingFace cache: ```python from datasets import load_dataset dataset = load_dataset("NRC-CNRC/Machine-Generated-Reviews-0.1") ``` Iterate on the training part of the dataset: ```python for sample in dataset["train"]: review: str = sample["review"] ... ``` ```python from datasets import load_dataset dataset = load_dataset("NRC-CNRC/Machine-Generated-Reviews-0.1") print(dataset) ``` ``` Generating train split: 125616 examples [00:06, 20093.99 examples/s] DatasetDict({ train: Dataset({ features: ['venue', 'year', 'model', 'submission_id', 'review_id', 'invitation_id', 'review'], num_rows: 125616 }) }) ``` ## Citation If you are referring to this dataset, please cite our [paper](https://aclanthology.org/2026.lrec-main.649). ``` @inproceedings{ fung-etal-2026-emphazing, title = { "Emphasizing the Commendable": A Study of Homogenized Transitive Verb Constructions in Machine Generated Peer Reviews }, author = "Fung, Hing-Yuet and Larkin, Samuel and Lo, Chi-kiu", booktitle = "Proceedings of the Fifteenth Language Resources and Evaluation Conference", month = may, year = "2026", address = "Palma de Mallorca, Spain", publisher = "European Language Resources Association" } ```

提供机构：

NRC-CNRC

5,000+

优质数据集

54 个

任务类型

进入经典数据集