jang1563/ContradictBio-338
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jang1563/ContradictBio-338
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-classification
language:
- en
tags:
- biology
- biomedicine
- contradiction-detection
- natural-language-inference
- evidence-quality
- peer-review
size_categories:
- n<1K
pretty_name: ContradictBio-338
dataset_info:
features:
- name: id
dtype: string
- name: source_pmid
dtype: string
- name: source_doi
dtype: string
- name: paper_title
dtype: string
- name: claim_a
dtype: string
- name: claim_b
dtype: string
- name: is_genuine_contradiction
dtype: bool
- name: contradiction_type
dtype: string
- name: confidence
dtype: float64
- name: rationale
dtype: string
- name: abstract_text
dtype: string
- name: confidence_tier
dtype: int64
splits:
- name: train
num_examples: 338
---
# ContradictBio-338
A gold-standard biomedical contradiction detection corpus with 5-category taxonomy and multi-model cross-validation.
## Overview
ContradictBio-338 contains 338 expert-annotated biomedical abstract pairs labeled for contradiction detection. Each entry is classified as either **genuine contradiction** or **contextual (non-contradiction)**, with genuine contradictions further categorized into 5 types.
This corpus was developed as part of [BioTeam-AI](https://github.com/jang1563/bioteam-ai), a multi-agent research automation system for biology.
## Contradiction Taxonomy (5 categories)
| Type | Genuine Count | Description |
|------|---------------|-------------|
| **direct** | 31 | Explicit factual disagreement between claims |
| **temporal** | 24 | Findings that changed over time or across study periods |
| **magnitude** | 23 | Quantitative disagreement (effect sizes, measurements) |
| **methodological** | 45 | Contradictions arising from different experimental approaches |
| **contextual** | 215 | Apparent contradictions explained by differing conditions (negative class) |
**Total**: 123 genuine contradictions + 215 contextual (non-contradictions) = 338 pairs
## Quality Validation: 6-Rater Cross-Validation
The corpus was validated using a Panel of LLM Evaluators (PoLL) method ([Verga et al. 2024](https://arxiv.org/abs/2404.18796)) — 3 models × 2 prompt strategies:
| Model | Prompt | Precision | Recall | F1 | Parse Fail% | Cost |
|-------|--------|-----------|--------|-----|-------------|------|
| Gemini 2.5 Flash | baseline | 0.619 | 0.645 | 0.632 | 0% | $0.00 |
| Gemini 2.5 Flash | contrastive | 0.459 | 0.967 | 0.623 | 7.7% | $0.00 |
| DeepSeek V3.2 | contrastive | 0.593 | 0.854 | **0.700** | 0% | $0.14 |
| Llama 4 Scout | contrastive | 0.599 | 0.932 | **0.729** | 32.5% | $0.05 |
| DeepSeek V3.2 | baseline | 0.778 | 0.285 | 0.417 | 0% | $0.05 |
| Llama 4 Scout | baseline | 0.933 | 0.156 | 0.267 | 17.2% | $0.06 |
### Panel Agreement
- **Krippendorff's alpha (binary)**: 0.352
- **Krippendorff's alpha (type)**: 0.560 (contrastive prompt)
- **Best pairwise kappa**: DeepSeek vs Llama 4 = 0.597
### Tiered Confidence Labels
| Tier | Criteria | Entries | Gold Match |
|------|----------|---------|------------|
| **Tier 1** | ≥ 5/6 raters agree | 120 | **94.2%** |
| **Tier 2** | 4/6 raters agree | 100 | 82.0% |
| **Tier 3** | Split / few agree | 118 | Needs review |
**Total cross-validation cost**: $0.36
### Key Finding
**Prompt design > model choice**: The contrastive prompt drives recall from 0.16–0.65 to 0.85–0.97 across all model families tested.
## Data Format
Each entry in the JSONL file contains:
```json
{
"id": "V3-DIR-0047",
"source_pmid": "35843572",
"source_doi": "10.1016/j.cellsig.2022.110410",
"paper_title": "ANGPTL4 attenuates palmitic acid-induced endothelial cell injury...",
"claim_a": "overexpression of ANGPTL4 in HUVECs enhanced cell proliferation...",
"claim_b": "knockdown of ANGPTL4 resulted in the opposite.",
"is_genuine_contradiction": true,
"contradiction_type": "direct",
"confidence": 1.0,
"rationale": "Overexpression enhanced proliferation while knockdown reversed it...",
"abstract_text": "Full abstract text from PubMed...",
"confidence_tier": 1
}
```
### Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique identifier (format: `V3-{TYPE}-{NNN}` or `V3-MET-C{NNN}` for some methodological entries) |
| `source_pmid` | string | PubMed ID of the source paper |
| `source_doi` | string | DOI of the source paper |
| `paper_title` | string | Full title of the source paper |
| `claim_a` | string | First extracted claim from the abstract |
| `claim_b` | string | Second extracted claim that may contradict `claim_a` |
| `is_genuine_contradiction` | bool | `true` = genuine contradiction, `false` = contextual |
| `contradiction_type` | string | One of: `direct`, `temporal`, `magnitude`, `methodological`, `contextual` |
| `confidence` | float | Annotation confidence score (0.0–1.0) |
| `rationale` | string | Explanation of why the pair is/isn't a contradiction |
| `abstract_text` | string | Full abstract text fetched from PubMed |
| `confidence_tier` | int | Panel agreement tier: 1 (≥5/6 raters agree), 2 (4/6), 3 (≤3/6) |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("jang1563/ContradictBio-338")
# Filter by confidence tier (Tier 1 = highest quality, 94.2% gold match)
tier1 = dataset["train"].filter(lambda x: x["confidence_tier"] == 1)
print(f"Tier 1 (validated): {len(tier1)} entries")
# Filter genuine contradictions
genuine = dataset["train"].filter(lambda x: x["is_genuine_contradiction"])
print(f"Genuine contradictions: {len(genuine)}")
# Access abstract context
for example in dataset["train"].select(range(3)):
print(f"[{example['contradiction_type']}] {example['claim_a'][:80]}...")
```
## Intended Use
- **Benchmarking** contradiction detection systems in biomedical NLP
- **Training** classifiers to distinguish genuine vs. contextual contradictions
- **Evaluating** prompt strategies for scientific claim analysis
- **Research** on evidence quality and scientific disagreement
## Limitations
- Within-abstract pairs only (v3); cross-paper pairs (v4, 800 entries) available in the BioTeam-AI repository
- Gold labels created by a single annotator with LLM-assisted 6-rater cross-validation (not multi-annotator human agreement)
- Tier 3 entries (95) have low panel agreement and may benefit from human review before use in training
- 8/338 entries (2.4%) have an empty `source_doi` field; all 338 are fully citable via `source_pmid`
- Some methodological entries use `V3-MET-C{NNN}` ID format instead of the standard `V3-{TYPE}-{NNN}`
## Citation
```bibtex
@software{kim2026bioteamai,
title = {BioTeam-AI: Personal AI Science Team for Biology Research},
author = {Kim, JangKeun},
year = {2026},
url = {https://github.com/jang1563/bioteam-ai},
license = {MIT}
}
```
## License
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) — Attribution-NonCommercial 4.0 International.
**You are free to:**
- Use, share, and adapt this dataset for **research, education, and non-profit purposes**
- Cite this work in academic publications
**You may NOT:**
- Use this dataset for **commercial purposes** without explicit written permission from the author
For commercial licensing inquiries, contact the author via the [BioTeam-AI repository](https://github.com/jang1563/bioteam-ai).
提供机构:
jang1563



