mingrrui/weiwanchengshujuji

Name: mingrrui/weiwanchengshujuji
Creator: mingrrui
Published: 2026-04-17 17:35:59
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/mingrrui/weiwanchengshujuji

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - image-to-text language: - en tags: - peer-review - scientific-review - paper-review - multimodal - ICLR size_categories: - 10K<n<100K --- # LLM as a Reviewer - Dataset This dataset supports the **LLM as a Reviewer** project, which trains and evaluates multimodal large language models for automated scientific paper reviewing. ## Dataset Structure ``` . ├── training/ # Training data (~1.4 GB) │ ├── review_68K_9pages.json # 68K review samples (9-page papers) │ ├── review_21K_query_weakness.json # 21K weakness-focused review samples │ ├── embedding_train_27k.jsonl # 27K embedding training pairs │ └── generator_train_27k.json # 27K generator training samples ├── gt_test/ # Ground-truth test set (~1.4 MB) │ ├── gt_test_ICLR2025_yixunlian.json │ └── gt_test_ICLR2025_weixunlian_9pages.json ├── paper_images/ # Paper page images (~105 GB) │ ├── train_2024/ # ICLR 2024 training images │ ├── train_2025/ # ICLR 2025 training images │ ├── test_2024/ # ICLR 2024 test images │ └── test_2025/ # ICLR 2025 test images └── figure_crops/ # Cropped figures from papers └── {paper_id}/ # 10,639 paper directories ``` ## Data Description ### Training Data - **review_68K_9pages.json**: 68K review training samples constructed from ICLR papers (up to 9 pages per paper), containing review text with page-level image references. - **review_21K_query_weakness.json**: 21K samples focused on weakness identification with localized evidence. - **embedding_train_27k.jsonl**: 27K training pairs for the embedding/retrieval model. - **generator_train_27k.json**: 27K training samples for the review generator model. ### Test Data - Ground-truth test sets from ICLR 2025 submissions, with both trained and untrained paper splits. ### Paper Images - Page-level renderings of ICLR 2024 and 2025 submissions, organized by paper ID. ### Figure Crops - Cropped figures and tables extracted from papers, organized by paper ID. ## Usage ```python # Download the dataset from huggingface_hub import snapshot_download snapshot_download(repo_id="mingrrui/llm-reviewer-data", repo_type="dataset", local_dir="./data") ``` ## Citation If you use this dataset, please cite our work.

提供机构：

mingrrui

5,000+

优质数据集

54 个

任务类型

进入经典数据集