five

popper-spiralworks/prediction_task

收藏
Hugging Face2025-10-30 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/popper-spiralworks/prediction_task
下载链接
链接失效反馈
官方服务:
资源简介:
--- tags: - ocr - peer-review - classification license: other language: - en --- # Popper Reviews — Private Prediction Subset ## Dataset Summary This repository exposes an 80/20 train/test split tailored for acceptance prediction tasks. Each example contains: - `paper_text`: OCR’d manuscript text. - `anonymized_paper_text`: the same text with the author block removed (starts at the abstract). - `decision_label`: normalized `accept`/`reject` outcome. - `decision_text`: original decision string when available. - `average_review_score`: mean of numeric reviewer ratings extracted from the Popper review JSON files. Source corpora: Popper’s ICLR, TMLR, and Nature review dumps. Only papers with an explicit accept/reject decision are included. Reference lists are removed from `anonymized_paper_text` to focus on the manuscript narrative. ## Splits | Split | Records | | --- | --- | | train | 1 884 | | test | 472 | Splits are stratified with an 80/20 ratio using a fixed random seed (42). ## Usage ```python from datasets import load_dataset data = load_dataset("popper-spiralworks/prediction_task", split="train", token=token) print(data[0]["decision_label"], data[0]["average_review_score"]) ``` ## Processing Notes - OCR text comes from DeepSeek-OCR via Popper (`metadata.backend = deepseek` when available). - Average scores are computed by parsing the numeric prefix of each reviewer `rating` field. - Non-numeric or missing ratings are ignored during averaging. - Additional review metadata and reviewer comments are available in the public dataset [`sumuks/research_papers_with_reviews_ocr`](https://huggingface.co/datasets/sumuks/research_papers_with_reviews_ocr). ## Attribution When using this dataset, please credit the original venues (ICLR, TMLR, Nature) and cite the Popper project. Access to this repository is restricted to the Popper Spiralworks collaboration.
提供机构:
popper-spiralworks
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作