five

Geoffrey-Wang/IntegriRef-Bench

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Geoffrey-Wang/IntegriRef-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification - fact-checking language: - en tags: - citation-verification - hallucination-detection - scientific-integrity - bayesian-scoring - reference-checking size_categories: - 10K<n<100K --- # IntegriRef-Bench Benchmark dataset for multi-level reference integrity verification, accompanying the IntegriRef framework. ## Dataset Splits | Split | Rows | Description | |-------|------|-------------| | `reference_verification` | 1,926 | Golden benchmark + crawled verification cases (hallucinated, real, chimera, retracted) | | `signal_unit_tests` | 403 | Per-signal unit tests for 14 Bayesian signal types | | `l1_intent` | 20 | Citation intent classification test pairs | | `l2_nli` | 20 | Claim-evidence NLI alignment test pairs | | `graph_anomaly` | 3,030 | Citation graph anomaly cases (rings, temporal, orphan clusters) | | `retracted_papers` | 6,391 | Retracted papers from Crossref + PubMed with real controls | | **Total** | **11,790** | | ## Usage ```python from datasets import load_dataset ds = load_dataset("Geoffrey-Wang/IntegriRef-Bench") # Or load a specific split ref_ver = load_dataset("Geoffrey-Wang/IntegriRef-Bench", data_files="reference_verification.jsonl") ``` ## Sources - **Retracted papers**: Crossref `update-to` retraction markers + PubMed retraction notices - **Hallucinated references**: Programmatically generated with verified non-existence - **Chimera references**: Real DOIs paired with swapped metadata - **Graph anomaly cases**: Documented citation cartels (Brazilian 2009-2013, Ji-Huan He, IOP 2024) - **Temporal anomalies**: OpenAlex citation graph analysis - **Statistical cases**: GRIM test + statcheck from PMC Open Access full texts ## Citation ```bibtex @inproceedings{integriref2026, title={IntegriRef: A Five-Layer Bayesian Framework for Cross-Domain Reference Integrity Verification}, author={Anonymous}, booktitle={KnowFM Workshop at ACL}, year={2026} } ``` ## License MIT
提供机构:
Geoffrey-Wang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作