popper-spiralworks/prediction_task
收藏Hugging Face2025-10-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/popper-spiralworks/prediction_task
下载链接
链接失效反馈官方服务:
资源简介:
---
tags:
- ocr
- peer-review
- classification
license: other
language:
- en
---
# Popper Reviews — Private Prediction Subset
## Dataset Summary
This repository exposes an 80/20 train/test split tailored for acceptance prediction tasks. Each example contains:
- `paper_text`: OCR’d manuscript text.
- `anonymized_paper_text`: the same text with the author block removed (starts at the abstract).
- `decision_label`: normalized `accept`/`reject` outcome.
- `decision_text`: original decision string when available.
- `average_review_score`: mean of numeric reviewer ratings extracted from the Popper review JSON files.
Source corpora: Popper’s ICLR, TMLR, and Nature review dumps. Only papers with an explicit accept/reject decision are included. Reference lists are removed from `anonymized_paper_text` to focus on the manuscript narrative.
## Splits
| Split | Records |
| --- | --- |
| train | 1 884 |
| test | 472 |
Splits are stratified with an 80/20 ratio using a fixed random seed (42).
## Usage
```python
from datasets import load_dataset
data = load_dataset("popper-spiralworks/prediction_task", split="train", token=token)
print(data[0]["decision_label"], data[0]["average_review_score"])
```
## Processing Notes
- OCR text comes from DeepSeek-OCR via Popper (`metadata.backend = deepseek` when available).
- Average scores are computed by parsing the numeric prefix of each reviewer `rating` field.
- Non-numeric or missing ratings are ignored during averaging.
- Additional review metadata and reviewer comments are available in the public dataset [`sumuks/research_papers_with_reviews_ocr`](https://huggingface.co/datasets/sumuks/research_papers_with_reviews_ocr).
## Attribution
When using this dataset, please credit the original venues (ICLR, TMLR, Nature) and cite the Popper project. Access to this repository is restricted to the Popper Spiralworks collaboration.
提供机构:
popper-spiralworks



