five

mingrrui/weiwanchengshujuji

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mingrrui/weiwanchengshujuji
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - image-to-text language: - en tags: - peer-review - scientific-review - paper-review - multimodal - ICLR size_categories: - 10K<n<100K --- # LLM as a Reviewer - Dataset This dataset supports the **LLM as a Reviewer** project, which trains and evaluates multimodal large language models for automated scientific paper reviewing. ## Dataset Structure ``` . ├── training/ # Training data (~1.4 GB) │ ├── review_68K_9pages.json # 68K review samples (9-page papers) │ ├── review_21K_query_weakness.json # 21K weakness-focused review samples │ ├── embedding_train_27k.jsonl # 27K embedding training pairs │ └── generator_train_27k.json # 27K generator training samples ├── gt_test/ # Ground-truth test set (~1.4 MB) │ ├── gt_test_ICLR2025_yixunlian.json │ └── gt_test_ICLR2025_weixunlian_9pages.json ├── paper_images/ # Paper page images (~105 GB) │ ├── train_2024/ # ICLR 2024 training images │ ├── train_2025/ # ICLR 2025 training images │ ├── test_2024/ # ICLR 2024 test images │ └── test_2025/ # ICLR 2025 test images └── figure_crops/ # Cropped figures from papers └── {paper_id}/ # 10,639 paper directories ``` ## Data Description ### Training Data - **review_68K_9pages.json**: 68K review training samples constructed from ICLR papers (up to 9 pages per paper), containing review text with page-level image references. - **review_21K_query_weakness.json**: 21K samples focused on weakness identification with localized evidence. - **embedding_train_27k.jsonl**: 27K training pairs for the embedding/retrieval model. - **generator_train_27k.json**: 27K training samples for the review generator model. ### Test Data - Ground-truth test sets from ICLR 2025 submissions, with both trained and untrained paper splits. ### Paper Images - Page-level renderings of ICLR 2024 and 2025 submissions, organized by paper ID. ### Figure Crops - Cropped figures and tables extracted from papers, organized by paper ID. ## Usage ```python # Download the dataset from huggingface_hub import snapshot_download snapshot_download(repo_id="mingrrui/llm-reviewer-data", repo_type="dataset", local_dir="./data") ``` ## Citation If you use this dataset, please cite our work.
提供机构:
mingrrui
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作