five

jang1563/ContradictBio-338

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/jang1563/ContradictBio-338
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-classification language: - en tags: - biology - biomedicine - contradiction-detection - natural-language-inference - evidence-quality - peer-review size_categories: - n<1K pretty_name: ContradictBio-338 dataset_info: features: - name: id dtype: string - name: source_pmid dtype: string - name: source_doi dtype: string - name: paper_title dtype: string - name: claim_a dtype: string - name: claim_b dtype: string - name: is_genuine_contradiction dtype: bool - name: contradiction_type dtype: string - name: confidence dtype: float64 - name: rationale dtype: string - name: abstract_text dtype: string - name: confidence_tier dtype: int64 splits: - name: train num_examples: 338 --- # ContradictBio-338 A gold-standard biomedical contradiction detection corpus with 5-category taxonomy and multi-model cross-validation. ## Overview ContradictBio-338 contains 338 expert-annotated biomedical abstract pairs labeled for contradiction detection. Each entry is classified as either **genuine contradiction** or **contextual (non-contradiction)**, with genuine contradictions further categorized into 5 types. This corpus was developed as part of [BioTeam-AI](https://github.com/jang1563/bioteam-ai), a multi-agent research automation system for biology. ## Contradiction Taxonomy (5 categories) | Type | Genuine Count | Description | |------|---------------|-------------| | **direct** | 31 | Explicit factual disagreement between claims | | **temporal** | 24 | Findings that changed over time or across study periods | | **magnitude** | 23 | Quantitative disagreement (effect sizes, measurements) | | **methodological** | 45 | Contradictions arising from different experimental approaches | | **contextual** | 215 | Apparent contradictions explained by differing conditions (negative class) | **Total**: 123 genuine contradictions + 215 contextual (non-contradictions) = 338 pairs ## Quality Validation: 6-Rater Cross-Validation The corpus was validated using a Panel of LLM Evaluators (PoLL) method ([Verga et al. 2024](https://arxiv.org/abs/2404.18796)) — 3 models × 2 prompt strategies: | Model | Prompt | Precision | Recall | F1 | Parse Fail% | Cost | |-------|--------|-----------|--------|-----|-------------|------| | Gemini 2.5 Flash | baseline | 0.619 | 0.645 | 0.632 | 0% | $0.00 | | Gemini 2.5 Flash | contrastive | 0.459 | 0.967 | 0.623 | 7.7% | $0.00 | | DeepSeek V3.2 | contrastive | 0.593 | 0.854 | **0.700** | 0% | $0.14 | | Llama 4 Scout | contrastive | 0.599 | 0.932 | **0.729** | 32.5% | $0.05 | | DeepSeek V3.2 | baseline | 0.778 | 0.285 | 0.417 | 0% | $0.05 | | Llama 4 Scout | baseline | 0.933 | 0.156 | 0.267 | 17.2% | $0.06 | ### Panel Agreement - **Krippendorff's alpha (binary)**: 0.352 - **Krippendorff's alpha (type)**: 0.560 (contrastive prompt) - **Best pairwise kappa**: DeepSeek vs Llama 4 = 0.597 ### Tiered Confidence Labels | Tier | Criteria | Entries | Gold Match | |------|----------|---------|------------| | **Tier 1** | ≥ 5/6 raters agree | 120 | **94.2%** | | **Tier 2** | 4/6 raters agree | 100 | 82.0% | | **Tier 3** | Split / few agree | 118 | Needs review | **Total cross-validation cost**: $0.36 ### Key Finding **Prompt design > model choice**: The contrastive prompt drives recall from 0.16–0.65 to 0.85–0.97 across all model families tested. ## Data Format Each entry in the JSONL file contains: ```json { "id": "V3-DIR-0047", "source_pmid": "35843572", "source_doi": "10.1016/j.cellsig.2022.110410", "paper_title": "ANGPTL4 attenuates palmitic acid-induced endothelial cell injury...", "claim_a": "overexpression of ANGPTL4 in HUVECs enhanced cell proliferation...", "claim_b": "knockdown of ANGPTL4 resulted in the opposite.", "is_genuine_contradiction": true, "contradiction_type": "direct", "confidence": 1.0, "rationale": "Overexpression enhanced proliferation while knockdown reversed it...", "abstract_text": "Full abstract text from PubMed...", "confidence_tier": 1 } ``` ### Fields | Field | Type | Description | |-------|------|-------------| | `id` | string | Unique identifier (format: `V3-{TYPE}-{NNN}` or `V3-MET-C{NNN}` for some methodological entries) | | `source_pmid` | string | PubMed ID of the source paper | | `source_doi` | string | DOI of the source paper | | `paper_title` | string | Full title of the source paper | | `claim_a` | string | First extracted claim from the abstract | | `claim_b` | string | Second extracted claim that may contradict `claim_a` | | `is_genuine_contradiction` | bool | `true` = genuine contradiction, `false` = contextual | | `contradiction_type` | string | One of: `direct`, `temporal`, `magnitude`, `methodological`, `contextual` | | `confidence` | float | Annotation confidence score (0.0–1.0) | | `rationale` | string | Explanation of why the pair is/isn't a contradiction | | `abstract_text` | string | Full abstract text fetched from PubMed | | `confidence_tier` | int | Panel agreement tier: 1 (≥5/6 raters agree), 2 (4/6), 3 (≤3/6) | ## Usage ```python from datasets import load_dataset dataset = load_dataset("jang1563/ContradictBio-338") # Filter by confidence tier (Tier 1 = highest quality, 94.2% gold match) tier1 = dataset["train"].filter(lambda x: x["confidence_tier"] == 1) print(f"Tier 1 (validated): {len(tier1)} entries") # Filter genuine contradictions genuine = dataset["train"].filter(lambda x: x["is_genuine_contradiction"]) print(f"Genuine contradictions: {len(genuine)}") # Access abstract context for example in dataset["train"].select(range(3)): print(f"[{example['contradiction_type']}] {example['claim_a'][:80]}...") ``` ## Intended Use - **Benchmarking** contradiction detection systems in biomedical NLP - **Training** classifiers to distinguish genuine vs. contextual contradictions - **Evaluating** prompt strategies for scientific claim analysis - **Research** on evidence quality and scientific disagreement ## Limitations - Within-abstract pairs only (v3); cross-paper pairs (v4, 800 entries) available in the BioTeam-AI repository - Gold labels created by a single annotator with LLM-assisted 6-rater cross-validation (not multi-annotator human agreement) - Tier 3 entries (95) have low panel agreement and may benefit from human review before use in training - 8/338 entries (2.4%) have an empty `source_doi` field; all 338 are fully citable via `source_pmid` - Some methodological entries use `V3-MET-C{NNN}` ID format instead of the standard `V3-{TYPE}-{NNN}` ## Citation ```bibtex @software{kim2026bioteamai, title = {BioTeam-AI: Personal AI Science Team for Biology Research}, author = {Kim, JangKeun}, year = {2026}, url = {https://github.com/jang1563/bioteam-ai}, license = {MIT} } ``` ## License [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) — Attribution-NonCommercial 4.0 International. **You are free to:** - Use, share, and adapt this dataset for **research, education, and non-profit purposes** - Cite this work in academic publications **You may NOT:** - Use this dataset for **commercial purposes** without explicit written permission from the author For commercial licensing inquiries, contact the author via the [BioTeam-AI repository](https://github.com/jang1563/bioteam-ai).
提供机构:
jang1563
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作