trivitaai/Grouth_Truth_Conversation
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/trivitaai/Grouth_Truth_Conversation
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
- vi
license: cc-by-4.0
task_categories:
- text-generation
- question-answering
tags:
- medical
- healthcare
- multi-turn
- doctor-patient
- ground-truth
- evaluation
- vietnamese
pretty_name: Ground Truth Healthcare Conversations
size_categories:
- n<1K
configs:
- config_name: curated_30
data_files:
- split: train
path: data/curated_30_v2.jsonl
- config_name: full_473
data_files:
- split: train
path: data/curated_all.jsonl
- config_name: tier1_with_clinical_notes
data_files:
- split: train
path: data/curated_30_ground_truth.jsonl
- config_name: vietnamese_human_ai
data_files:
- split: train
path: data/hellobacsi_vi_50.jsonl
---
# Ground Truth Healthcare Conversations
Real-life multi-turn healthcare conversations for model evaluation and development reference.
Used as ground truth for the **SOCRATES** medical AI project by Trivita AI.
## Dataset Viewer Subsets
| Subset | File | Count | Language | Description |
|--------|------|-------|----------|-------------|
| `curated_30` | `curated_30_v2.jsonl` | **30** | EN | Best 30 conversations across 5 sources — primary eval set |
| `full_473` | `curated_all.jsonl` | **473** | EN | Full English collection |
| `tier1_with_clinical_notes` | `curated_30_ground_truth.jsonl` | **30** | EN | MTS-Dialog + ACI-Bench + PriMock57 with clinical note ground truth |
| `vietnamese_human_ai` | `hellobacsi_vi_50.jsonl` | **50** | **VI** | Vietnamese patient questions answered by Hello Bacsi AI |
## Sources
| Source | Type | Language | Link |
|--------|------|----------|------|
| MTS-Dialog | Real doctor–patient + clinical note | EN | https://github.com/abachaa/MTS-Dialog |
| ACI-Bench | Real clinical encounter + SOAP note | EN | https://github.com/wyim/aci-bench |
| PriMock57 | Mock GP consultation (audio+transcript) | EN | https://github.com/babylonhealth/priMock57 |
| r/AskDocs | Patient Q&A with verified doctor replies | EN | https://reddit.com/r/AskDocs |
| LMSYS-Chat-1M | Human–LLM, health filtered | EN | https://huggingface.co/datasets/lmsys/lmsys-chat-1m |
| hellobacsi.com | Vietnamese patient Q&A, AI replies | VI | https://hellobacsi.com/community |
## Format
```json
{
"id": "mts_0042",
"source": "mts_dialog | aci_bench | primock57 | reddit_askdocs | lmsys_chat_1m | hellobacsi_community",
"type": "doctor_patient | patient_verified_doctor | human_ai | human_ai_vi",
"language": "en | vi",
"turns": [
{"role": "doctor", "content": "What brings you in today?"},
{"role": "patient", "content": "I have had chest pain for 3 days..."}
],
"ground_truth": {
"clinical_note": "Patient presents with...",
"section_header": "GENHX"
},
"metadata": {"num_turns": 14}
}
```
## Usage
```python
from datasets import load_dataset
ds = load_dataset("trivitaai/Grouth_Truth_Conversation", name="curated_30")
ds_vi = load_dataset("trivitaai/Grouth_Truth_Conversation", name="vietnamese_human_ai")
ds_all = load_dataset("trivitaai/Grouth_Truth_Conversation", name="full_473")
ds_notes = load_dataset("trivitaai/Grouth_Truth_Conversation", name="tier1_with_clinical_notes")
```
提供机构:
trivitaai



