five

Abu-Sameer-66/SciPeerBench

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Abu-Sameer-66/SciPeerBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-classification language: - en tags: - scientific-integrity - fraud-detection - peer-review - research-ethics - benchmark size_categories: - n<1K --- # SciPeerBench v1.1 **World's first multi-dimensional scientific fraud detection benchmark.** No other dataset labels papers across 14 fraud dimensions simultaneously. ## Stats | Property | Value | |----------|-------| | Total papers | 644 | | Fraud papers | 286 | | Clean papers | 358 | | Columns | 35 | | Fraud dimensions | 14 | | Year range | 1998–2026 | ## Why This Dataset is Unique Every existing fraud dataset does binary labeling — fraud or not fraud. SciPeerBench labels each paper across **14 dimensions** — statistical fraud, figure manipulation, citation rings, LLM detection, and more. Nobody has done this before. ## Categories | Category | Count | Description | |----------|-------|-------------| | CONFIRMED_FRAUD | 286 | PubMed retracted + CrossRef verified | | CLEAN | 223 | High quality multi-field papers | | SUSPECTED_FRAUD | 109 | Journal expressions of concern | | BORDERLINE | 16 | Famous disputed cases | | BASELINE_ELITE | 10 | Nobel Prize and landmark papers | ## 14 Fraud Dimensions Per Paper | Column | What it detects | |--------|----------------| | stat_audit_score | p-hacking, sample size issues | | figure_forensics_score | image duplication, manipulation | | methodology_score | causation claims, missing controls | | citation_score | self-citation rings | | reproducibility_score | code and data availability | | novelty_score | incremental vs novel work | | grim_score | mathematically impossible means | | sprite_score | impossible distributions | | granularity_score | Benford law violations | | pcurve_score | publication bias | | effect_size_score | inflated effect sizes | | retraction_score | cited retracted papers | | cartel_score | citation ring networks | | llm_score | AI-generated paper detection | ## Famous Fraud Cases Included - **Wakefield 1998** (The Lancet) — vaccines-autism data fabrication - **LaCour 2014** (Science) — survey data never existed - **Obokata 2014** (Nature) — STAP cells image manipulation - **Stapel** — 58 psychology papers fabricated over 10 years ## Data Sources - PubMed `Retracted Publication[pt]` — US National Library of Medicine - CrossRef `update-type:retraction` — publisher verified DOI registry - PubMed `Expression of Concern[pt]` — journal flagged papers - Manual curation — court and media verified fraud cases ## Associated Project Built to train **SciPeerAI-7B** — world's first scientific integrity LLM. - Live demo: https://scipeerai-ui.vercel.app - API: https://web-production-f526d.up.railway.app/docs - GitHub: https://github.com/Abu-Sameer-66/SciPeerAI ## Citation ```bibtex @dataset{nadeem2026scipeerai_bench, author = {Nadeem, Sameer}, title = {SciPeerBench: Multi-dimensional Scientific Fraud Detection Benchmark}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/Abu-Sameer-66/SciPeerBench} } ``` ## License CC BY 4.0 — free to use with attribution.
提供机构:
Abu-Sameer-66
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作