mingrrui/weiwanchengshujuji
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mingrrui/weiwanchengshujuji
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- image-to-text
language:
- en
tags:
- peer-review
- scientific-review
- paper-review
- multimodal
- ICLR
size_categories:
- 10K<n<100K
---
# LLM as a Reviewer - Dataset
This dataset supports the **LLM as a Reviewer** project, which trains and evaluates multimodal large language models for automated scientific paper reviewing.
## Dataset Structure
```
.
├── training/ # Training data (~1.4 GB)
│ ├── review_68K_9pages.json # 68K review samples (9-page papers)
│ ├── review_21K_query_weakness.json # 21K weakness-focused review samples
│ ├── embedding_train_27k.jsonl # 27K embedding training pairs
│ └── generator_train_27k.json # 27K generator training samples
├── gt_test/ # Ground-truth test set (~1.4 MB)
│ ├── gt_test_ICLR2025_yixunlian.json
│ └── gt_test_ICLR2025_weixunlian_9pages.json
├── paper_images/ # Paper page images (~105 GB)
│ ├── train_2024/ # ICLR 2024 training images
│ ├── train_2025/ # ICLR 2025 training images
│ ├── test_2024/ # ICLR 2024 test images
│ └── test_2025/ # ICLR 2025 test images
└── figure_crops/ # Cropped figures from papers
└── {paper_id}/ # 10,639 paper directories
```
## Data Description
### Training Data
- **review_68K_9pages.json**: 68K review training samples constructed from ICLR papers (up to 9 pages per paper), containing review text with page-level image references.
- **review_21K_query_weakness.json**: 21K samples focused on weakness identification with localized evidence.
- **embedding_train_27k.jsonl**: 27K training pairs for the embedding/retrieval model.
- **generator_train_27k.json**: 27K training samples for the review generator model.
### Test Data
- Ground-truth test sets from ICLR 2025 submissions, with both trained and untrained paper splits.
### Paper Images
- Page-level renderings of ICLR 2024 and 2025 submissions, organized by paper ID.
### Figure Crops
- Cropped figures and tables extracted from papers, organized by paper ID.
## Usage
```python
# Download the dataset
from huggingface_hub import snapshot_download
snapshot_download(repo_id="mingrrui/llm-reviewer-data", repo_type="dataset", local_dir="./data")
```
## Citation
If you use this dataset, please cite our work.
提供机构:
mingrrui



