five

ekacare/BODHI-S

收藏
Hugging Face2026-04-13 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/ekacare/BODHI-S
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en tags: - medical - knowledge-graph - clinical - healthcare - india - snomed - graph - symptom-checker - triage - differential-diagnosis pretty_name: BODHI-S — Clinical Condition-Symptom Knowledge Graph size_categories: - 10K<n<100K --- # BODHI-S — Condition-Symptom Knowledge Graph Part of **BODHI (Bharat Ontology for Disease & Healthcare Informatics)** — open clinical knowledge graphs for grounding healthcare AI in verified medical facts. → [Full writeup: motivation, design & use cases](https://info.eka.care/services/bodhi-bharat-ontology-for-disease-healthcare-informatics) → [GitHub (all formats: Neo4j, CSV, PyG, RDF)](https://github.com/eka-care/BODHI) --- ## What is BODHI-S? BODHI-S maps clinical conditions to their presenting symptoms, associated medical specialities, and inter-condition risk relationships. It was built and validated by expert clinicians at [Eka Care](https://www.eka.care) and has powered production symptom checking and differential diagnosis systems across millions of patient interactions in India. Symptoms are modelled as **compound variants** — e.g. *Fever*, *Fever with chills*, *Fever for 3 days* — each as a distinct node, because the nuanced presentation of a symptom materially shifts the clinical probability of underlying conditions. Every node carries triage levels (`emergency`, `worrisome`, `opd_managed`) and demographic likelihood scores across age and gender cohorts, derived from expert consensus and normalized against Indian EHR data. ## Stats | Metric | Count | |---|---| | Condition nodes | 779 | | Symptom nodes (variants) | 4,037 | | Symptom root concepts (distinct SNOMED IDs) | 590 | | Speciality nodes | 39 | | **Total relationships** | **13,204** | | Symptom → Condition (PRESENT_IN) | 10,352 | | Condition → Speciality (TREATED_BY) | 1,558 | | Condition → Condition (IS_INFLUENCED_BY) | 1,020 | | Condition → Condition (RELATED_TO) | 221 | | Condition → Condition (HAS_PREREQUISITE) | 53 | **Condition triage:** OPD Managed `367` (47%) · Worrisome `223` (29%) · Emergency `189` (24%) **Most cross-cutting symptoms:** Fever (145 conditions) · Fatigue (126) · Headache (110) **Top specialities:** Internal Medicine (292) · General Physician (205) · Orthopedic (139) ## Files | File | Description | |---|---| | `triples.jsonl` | `(head, relation, tail, properties)` structured triples | | `nl_facts.jsonl` | Natural-language fact strings, suitable for LLM fine-tuning / RAG | For Neo4j dump, CSV, PyTorch Geometric, and RDF/Turtle formats, see the [GitHub repository](https://github.com/eka-care/BODHI). ## Schema (triples) Each line in `triples.jsonl`: ```json { "head": "<node_id>", "head_type": "Symptom | Condition | Speciality", "relation": "PRESENT_IN | TREATED_BY | IS_INFLUENCED_BY | HAS_PREREQUISITE | RELATED_TO", "tail": "<node_id>", "tail_type": "Symptom | Condition | Speciality", "properties": { ... } } ``` ## Standards - **SNOMED CT** — all conditions and symptom root concepts carry SNOMED IDs - Triage and demographic likelihood scores derived from Indian primary care EHR data ## Use Cases - **GraphRAG** — structured grounding layer for LLMs on clinical reasoning tasks - **Symptom checking & triage** — deterministic, offline-capable differential diagnosis - **Specialty routing** — maps any condition to the appropriate care discipline - **GNN training** — heterogeneous graph with rich edge properties for graph neural networks ## License [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) — free for non-commercial use with attribution to [Eka Care](https://www.eka.care).
提供机构:
ekacare
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作