NRC-CNRC/Machine-Generated-Reviews-0.1
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/NRC-CNRC/Machine-Generated-Reviews-0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- other
- text-generation
language:
- en
pretty_name: Machine Generated Reviews
size_categories:
- 100K<n<1M
task_ids:
- language-modeling
- text2text-generation
tags:
- text
- text-generation
viewer: true
dataset_info:
features:
- name: venue
dtype: string
- name: year
dtype: int32
- name: model
dtype: string
- name: submission_id
dtype: string
- name: review_id
dtype: string
- name: invitation_id
dtype: string
- name: review
dtype: string
---
# Machine Generated Reviews
This dataset contains the machine generated peer reviews used in the study of machine generated text (MGT) output syntactic homogenization in ["Emphasizing the Commendable": A Study of Homogenized Transitive Verb Constructions in Machine Generated Peer Reviews](https://aclanthology.org/2026.lrec-main.649).
The corresponding academic research papers and official reviews are available on [OpenReview](https://openreview.net/).
The machine generated peer reviews are produced by three LLMs with a diverse background.
The prompts and generated text are all in English.
## Prompts
The prompt used for generating LLM reviews.
```
Your task is to write a review given a paper titled {title} and the paper content is: {paper_content}. Your output should be like the following format:
Summary:
Strengths And Weaknesses:
Summary Of The Review:
```
`{title}` is the paper's title and is available from OpenReview’s API and `{paper_content}` is the paper's content, the text extracted from the PDF file of that paper.
## Dataset Overview
Each entries have the following fields:
- `venue` the venue's name
- `year` the venue's year
- `model` the model used to generate the review
- `submission_id` the submission id
- `review_id` first 16 bytes of the `sha1` representation of the review
- `invitation_id` the submission invitation id
- `review` the machine generated review using `model`
Given the following entry:
```json
{
"venue": "robot-learning.org/CoRL",
"year": 2024,
"model": "Qwen/Qwen3-4B-Instruct-2507",
"submission_id": "zr2GPi3DSb",
"review_id": "782088da99d7f6ce",
"invitation_id": "robot-learning.org/CoRL/2024/Conference/-/Submission",
"review": "**Summary:** \nThis paper presents..."
}
```
you can access the human reviews by substituting `{submission_id}` in `https://openreview.net/forum?id={submission_id}`.
For the previous entry, you would access the human reviews at `https://openreview.net/forum?id=zr2GPi3DSb`.
Below is a summary of the machine generated peer reviews counts.
Note that these numbers differ from Table 1 in [our paper](https://aclanthology.org/2026.lrec-main.649) since we are not including the human reviews as they can be found on [OpenReview](https://openreview.net/).
| model | # review |
| :-------------------------- | -------: |
| google/gemma-3-4b-it | 41872 |
| gpt-4o-2024-08-06 | 41872 |
| Qwen/Qwen3-4B-Instruct-2507 | 41872 |
| year | # review |
| :--- | -------: |
| 2018 | 2727 |
| 2019 | 4125 |
| 2020 | 6354 |
| 2021 | 16050 |
| 2022 | 15987 |
| 2023 | 24402 |
| 2024 | 29247 |
| 2025 | 26724 |
| venue | year | # review |
| :---------------------- | :--- | -------: |
| EMNLP | 2023 | 5739 |
| ICLR.cc | 2018 | 2727 |
| ICLR.cc | 2019 | 4125 |
| ICLR.cc | 2020 | 6354 |
| ICLR.cc | 2021 | 7341 |
| ICLR.cc | 2022 | 7029 |
| ICLR.cc | 2023 | 9303 |
| ICLR.cc | 2024 | 19266 |
| ICLR.cc | 2025 | 26724 |
| NeurIPS.cc | 2021 | 8253 |
| NeurIPS.cc | 2022 | 8367 |
| NeurIPS.cc | 2023 | 8784 |
| NeurIPS.cc | 2024 | 9216 |
| robot-learning.org/CoRL | 2021 | 456 |
| robot-learning.org/CoRL | 2022 | 591 |
| robot-learning.org/CoRL | 2023 | 576 |
| robot-learning.org/CoRL | 2024 | 765 |
## Usage examples (python)
Load dataset from HuggingFace cache:
```python
from datasets import load_dataset
dataset = load_dataset("NRC-CNRC/Machine-Generated-Reviews-0.1")
```
Iterate on the training part of the dataset:
```python
for sample in dataset["train"]:
review: str = sample["review"]
...
```
```python
from datasets import load_dataset
dataset = load_dataset("NRC-CNRC/Machine-Generated-Reviews-0.1")
print(dataset)
```
```
Generating train split: 125616 examples [00:06, 20093.99 examples/s]
DatasetDict({
train: Dataset({
features: ['venue', 'year', 'model', 'submission_id', 'review_id', 'invitation_id', 'review'],
num_rows: 125616
})
})
```
## Citation
If you are referring to this dataset, please cite our [paper](https://aclanthology.org/2026.lrec-main.649).
```
@inproceedings{
fung-etal-2026-emphazing,
title = { "Emphasizing the Commendable": A Study of Homogenized Transitive Verb Constructions in Machine Generated Peer Reviews },
author = "Fung, Hing-Yuet and
Larkin, Samuel and
Lo, Chi-kiu",
booktitle = "Proceedings of the Fifteenth Language Resources and Evaluation Conference",
month = may,
year = "2026",
address = "Palma de Mallorca, Spain",
publisher = "European Language Resources Association"
}
```
提供机构:
NRC-CNRC



