ekacare/BODHI-S
收藏Hugging Face2026-04-13 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/ekacare/BODHI-S
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
language:
- en
tags:
- medical
- knowledge-graph
- clinical
- healthcare
- india
- snomed
- graph
- symptom-checker
- triage
- differential-diagnosis
pretty_name: BODHI-S — Clinical Condition-Symptom Knowledge Graph
size_categories:
- 10K<n<100K
---
# BODHI-S — Condition-Symptom Knowledge Graph
Part of **BODHI (Bharat Ontology for Disease & Healthcare Informatics)** — open clinical knowledge graphs for grounding healthcare AI in verified medical facts.
→ [Full writeup: motivation, design & use cases](https://info.eka.care/services/bodhi-bharat-ontology-for-disease-healthcare-informatics)
→ [GitHub (all formats: Neo4j, CSV, PyG, RDF)](https://github.com/eka-care/BODHI)
---
## What is BODHI-S?
BODHI-S maps clinical conditions to their presenting symptoms, associated medical specialities, and inter-condition risk relationships. It was built and validated by expert clinicians at [Eka Care](https://www.eka.care) and has powered production symptom checking and differential diagnosis systems across millions of patient interactions in India.
Symptoms are modelled as **compound variants** — e.g. *Fever*, *Fever with chills*, *Fever for 3 days* — each as a distinct node, because the nuanced presentation of a symptom materially shifts the clinical probability of underlying conditions. Every node carries triage levels (`emergency`, `worrisome`, `opd_managed`) and demographic likelihood scores across age and gender cohorts, derived from expert consensus and normalized against Indian EHR data.
## Stats
| Metric | Count |
|---|---|
| Condition nodes | 779 |
| Symptom nodes (variants) | 4,037 |
| Symptom root concepts (distinct SNOMED IDs) | 590 |
| Speciality nodes | 39 |
| **Total relationships** | **13,204** |
| Symptom → Condition (PRESENT_IN) | 10,352 |
| Condition → Speciality (TREATED_BY) | 1,558 |
| Condition → Condition (IS_INFLUENCED_BY) | 1,020 |
| Condition → Condition (RELATED_TO) | 221 |
| Condition → Condition (HAS_PREREQUISITE) | 53 |
**Condition triage:** OPD Managed `367` (47%) · Worrisome `223` (29%) · Emergency `189` (24%)
**Most cross-cutting symptoms:** Fever (145 conditions) · Fatigue (126) · Headache (110)
**Top specialities:** Internal Medicine (292) · General Physician (205) · Orthopedic (139)
## Files
| File | Description |
|---|---|
| `triples.jsonl` | `(head, relation, tail, properties)` structured triples |
| `nl_facts.jsonl` | Natural-language fact strings, suitable for LLM fine-tuning / RAG |
For Neo4j dump, CSV, PyTorch Geometric, and RDF/Turtle formats, see the [GitHub repository](https://github.com/eka-care/BODHI).
## Schema (triples)
Each line in `triples.jsonl`:
```json
{
"head": "<node_id>",
"head_type": "Symptom | Condition | Speciality",
"relation": "PRESENT_IN | TREATED_BY | IS_INFLUENCED_BY | HAS_PREREQUISITE | RELATED_TO",
"tail": "<node_id>",
"tail_type": "Symptom | Condition | Speciality",
"properties": { ... }
}
```
## Standards
- **SNOMED CT** — all conditions and symptom root concepts carry SNOMED IDs
- Triage and demographic likelihood scores derived from Indian primary care EHR data
## Use Cases
- **GraphRAG** — structured grounding layer for LLMs on clinical reasoning tasks
- **Symptom checking & triage** — deterministic, offline-capable differential diagnosis
- **Specialty routing** — maps any condition to the appropriate care discipline
- **GNN training** — heterogeneous graph with rich edge properties for graph neural networks
## License
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) — free for non-commercial use with attribution to [Eka Care](https://www.eka.care).
提供机构:
ekacare



