five

jang1563/bioreview-bench

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jang1563/bioreview-bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: other task_categories: - text-classification - text-generation tags: - peer-review - biomedical - benchmark - scientific-review - elife - plos - f1000research - peerj - nature - rebuttal - open-peer-review pretty_name: "BioReview-Bench" size_categories: - 1K<n<10K configs: - config_name: default default: true data_files: - split: train path: "data/default/train.jsonl" - split: validation path: "data/default/validation.jsonl" - split: test path: "data/default/test.jsonl" - config_name: benchmark data_files: - split: train path: "data/benchmark/train.jsonl" - split: validation path: "data/benchmark/validation.jsonl" - split: test path: "data/benchmark/test.jsonl" - config_name: concerns_flat data_files: - split: train path: "data/concerns_flat/train.jsonl" - split: validation path: "data/concerns_flat/validation.jsonl" - split: test path: "data/concerns_flat/test.jsonl" - config_name: elife data_files: - split: train path: "data/elife/train.jsonl" - split: validation path: "data/elife/validation.jsonl" - split: test path: "data/elife/test.jsonl" - config_name: plos data_files: - split: train path: "data/plos/train.jsonl" - split: validation path: "data/plos/validation.jsonl" - split: test path: "data/plos/test.jsonl" - config_name: f1000 data_files: - split: train path: "data/f1000/train.jsonl" - split: validation path: "data/f1000/validation.jsonl" - split: test path: "data/f1000/test.jsonl" - config_name: peerj data_files: - split: train path: "data/peerj/train.jsonl" - split: validation path: "data/peerj/validation.jsonl" - split: test path: "data/peerj/test.jsonl" - config_name: nature data_files: - split: train path: "data/nature/train.jsonl" - split: validation path: "data/nature/validation.jsonl" - split: test path: "data/nature/test.jsonl" dataset_info: - config_name: default splits: - name: train num_examples: 5387 - name: validation num_examples: 953 - name: test num_examples: 600 - config_name: benchmark splits: - name: train num_examples: 5387 - name: validation num_examples: 953 - name: test num_examples: 600 - config_name: concerns_flat splits: - name: train num_examples: 79121 - name: validation num_examples: 14101 - name: test num_examples: 8647 - config_name: elife splits: - name: train num_examples: 1409 - name: validation num_examples: 251 - name: test num_examples: 150 - config_name: plos splits: - name: train num_examples: 1349 - name: validation num_examples: 238 - name: test num_examples: 150 - config_name: f1000 splits: - name: train num_examples: 2149 - name: validation num_examples: 380 - name: test num_examples: 150 - config_name: peerj splits: - name: train num_examples: 165 - name: validation num_examples: 29 - name: test num_examples: 50 - config_name: nature splits: - name: train num_examples: 315 - name: validation num_examples: 55 - name: test num_examples: 100 --- # BioReview-Bench A benchmark and training dataset for AI-assisted biomedical peer review. - **6,940 articles** with **101,869 reviewer concerns** - Sources: elife (1810), f1000 (2679), nature (470), peerj (244), plos (1737) - Concern-level labels: 9 categories, 3 severity levels, 5 author stance types - License: benchmark metadata CC-BY-NC-4.0 | source content follows per-source terms | code Apache-2.0 ## What makes this dataset unique No other publicly available dataset provides **structured, concern-level peer review data** for biomedical papers with: - Categorised reviewer concerns (design flaw, statistical methodology, etc.) - Severity labels (major / minor / optional) - Author response tracking (conceded / rebutted / partial / unclear / no_response) - Evidence-of-change flags ## Configs | Config | Total rows | Total concerns | |--------|-----------|---------------| | `default` | 6,940 | 101,869 | | `benchmark` | 6,940 | 93,222 | | `concerns_flat` | 101,869 | 101,869 | | `elife` | 1,810 | 11,772 | | `plos` | 1,737 | 33,160 | | `f1000` | 2,679 | 45,248 | | `peerj` | 244 | 5,003 | | `nature` | 470 | 6,686 | - **`default`**: Full data — all fields, all sources. Use for analysis and research. - **`benchmark`**: Task input format for AI review tool evaluation. Train/val include simplified concerns (text + category + severity). Test split has `concerns=[]` to prevent label leakage. - **`concerns_flat`**: One row per concern with article context. Ideal for rebuttal generation training and stance classification. PLOS entries included (filter with `author_stance != "no_response"` for rebuttal tasks). - **`elife`** / **`plos`** / **`f1000`** / **`peerj`** / **`nature`**: Source-specific subsets of `default`. ## Quick start ```python from datasets import load_dataset # Full dataset (default config) ds = load_dataset("jang1563/bioreview-bench") # Benchmark evaluation — test split has no concerns (your tool generates them) ds = load_dataset("jang1563/bioreview-bench", "benchmark") for article in ds["test"]: text = article["paper_text_sections"] # ... run your review tool, then evaluate with bioreview_bench.evaluate.metrics # Training a review generation model ds = load_dataset("jang1563/bioreview-bench", "benchmark") for article in ds["train"]: target_concerns = article["concerns"] # [{concern_text, category, severity}] # Rebuttal generation / stance classification ds = load_dataset("jang1563/bioreview-bench", "concerns_flat") for row in ds["train"]: concern = row["concern_text"] response = row["author_response_text"] stance = row["author_stance"] # conceded / rebutted / partial / unclear / no_response # Source-specific analysis ds = load_dataset("jang1563/bioreview-bench", "elife") ``` ## Schema ### Article fields (default config) | Field | Type | Description | |-------|------|-------------| | `id` | string | Article ID (e.g. `elife:84798`) | | `source` | string | Journal source (`elife`, `plos`, `f1000`, `peerj`, `nature`) | | `doi` | string | Article DOI | | `title` | string | Article title | | `abstract` | string | Abstract text | | `subjects` | list[string] | Subject areas | | `published_date` | string | ISO date | | `paper_text_sections` | dict | Section name → text | | `decision_letter_raw` | string | Raw peer review text | | `author_response_raw` | string | Raw author response | | `concerns` | list[object] | Extracted reviewer concerns | ### Concern fields | Field | Type | Description | |-------|------|-------------| | `concern_id` | string | Unique ID (e.g. `elife:84798:R1C3`) | | `concern_text` | string | Reviewer's concern (10-2000 chars) | | `category` | string | One of 9 types (see below) | | `severity` | string | `major` / `minor` / `optional` | | `author_response_text` | string | Author's response to this concern | | `author_stance` | string | `conceded` / `rebutted` / `partial` / `unclear` / `no_response` | | `evidence_of_change` | bool? | Whether author made revisions | | `resolution_confidence` | float | LLM confidence (0.0-1.0) | ### Concern categories `design_flaw`, `statistical_methodology`, `missing_experiment`, `figure_issue`, `prior_art_novelty`, `writing_clarity`, `reagent_method_specificity`, `interpretation`, `other` ## Leaderboard (test split) | Rank | Tool | Version | Recall | Precision | F1 | Major Recall | |------|------|---------|--------|-----------|-----|--------------| | 1 | Haiku-4.5 | claude-haiku-4-5-20251001 | 0.725 | 0.675 | 0.699 | 0.872 | | 2 | GPT-4o-mini | gpt-4o-mini | 0.684 | 0.703 | 0.694 | 0.840 | | 3 | Gemini-2.5-Flash | gemini-2.5-flash | 0.665 | 0.709 | 0.686 | 0.832 | | 4 | BM25 | bm25-specter2 | 0.637 | 0.741 | 0.685 | 0.794 | | 5 | Gemini-Flash-Lite | gemini-2.5-flash-lite | 0.615 | 0.708 | 0.658 | 0.781 | | 6 | Llama-3.3-70B | llama-3.3-70b | 0.554 | 0.794 | 0.653 | 0.753 | > Matching: SPECTER2 cosine similarity, threshold=0.65, Hungarian bipartite matching. > Figure-issue concerns excluded. 944 scored articles. > Submit results via [GitHub](https://github.com/jang1563/bioreview-bench). ## License - **Benchmark annotations and packaging metadata**: CC-BY-NC-4.0. - **Underlying article, review, and author-response content**: source-specific. Redistribution is not uniform across all sources; follow `LICENSE_MATRIX.md` in the GitHub repository and the original publisher terms. - **Code** (Python package, evaluation harness): Apache-2.0. See the [GitHub repository](https://github.com/jang1563/bioreview-bench) for full license details. ## Citation If you use this dataset, please cite: ```bibtex @misc{bioreview-bench, title={BioReview-Bench: A Benchmark for AI-Assisted Biomedical Peer Review}, author={Kim, JangKeun}, year={2026}, url={https://huggingface.co/datasets/jang1563/bioreview-bench} } ```
提供机构:
jang1563
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作