trivitaai/Grouth_Truth_Conversation

Name: trivitaai/Grouth_Truth_Conversation
Creator: trivitaai
Published: 2026-04-10 05:56:01
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/trivitaai/Grouth_Truth_Conversation

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - vi license: cc-by-4.0 task_categories: - text-generation - question-answering tags: - medical - healthcare - multi-turn - doctor-patient - ground-truth - evaluation - vietnamese pretty_name: Ground Truth Healthcare Conversations size_categories: - n<1K configs: - config_name: curated_30 data_files: - split: train path: data/curated_30_v2.jsonl - config_name: full_473 data_files: - split: train path: data/curated_all.jsonl - config_name: tier1_with_clinical_notes data_files: - split: train path: data/curated_30_ground_truth.jsonl - config_name: vietnamese_human_ai data_files: - split: train path: data/hellobacsi_vi_50.jsonl --- # Ground Truth Healthcare Conversations Real-life multi-turn healthcare conversations for model evaluation and development reference. Used as ground truth for the **SOCRATES** medical AI project by Trivita AI. ## Dataset Viewer Subsets | Subset | File | Count | Language | Description | |--------|------|-------|----------|-------------| | `curated_30` | `curated_30_v2.jsonl` | **30** | EN | Best 30 conversations across 5 sources — primary eval set | | `full_473` | `curated_all.jsonl` | **473** | EN | Full English collection | | `tier1_with_clinical_notes` | `curated_30_ground_truth.jsonl` | **30** | EN | MTS-Dialog + ACI-Bench + PriMock57 with clinical note ground truth | | `vietnamese_human_ai` | `hellobacsi_vi_50.jsonl` | **50** | **VI** | Vietnamese patient questions answered by Hello Bacsi AI | ## Sources | Source | Type | Language | Link | |--------|------|----------|------| | MTS-Dialog | Real doctor–patient + clinical note | EN | https://github.com/abachaa/MTS-Dialog | | ACI-Bench | Real clinical encounter + SOAP note | EN | https://github.com/wyim/aci-bench | | PriMock57 | Mock GP consultation (audio+transcript) | EN | https://github.com/babylonhealth/priMock57 | | r/AskDocs | Patient Q&A with verified doctor replies | EN | https://reddit.com/r/AskDocs | | LMSYS-Chat-1M | Human–LLM, health filtered | EN | https://huggingface.co/datasets/lmsys/lmsys-chat-1m | | hellobacsi.com | Vietnamese patient Q&A, AI replies | VI | https://hellobacsi.com/community | ## Format ```json { "id": "mts_0042", "source": "mts_dialog | aci_bench | primock57 | reddit_askdocs | lmsys_chat_1m | hellobacsi_community", "type": "doctor_patient | patient_verified_doctor | human_ai | human_ai_vi", "language": "en | vi", "turns": [ {"role": "doctor", "content": "What brings you in today?"}, {"role": "patient", "content": "I have had chest pain for 3 days..."} ], "ground_truth": { "clinical_note": "Patient presents with...", "section_header": "GENHX" }, "metadata": {"num_turns": 14} } ``` ## Usage ```python from datasets import load_dataset ds = load_dataset("trivitaai/Grouth_Truth_Conversation", name="curated_30") ds_vi = load_dataset("trivitaai/Grouth_Truth_Conversation", name="vietnamese_human_ai") ds_all = load_dataset("trivitaai/Grouth_Truth_Conversation", name="full_473") ds_notes = load_dataset("trivitaai/Grouth_Truth_Conversation", name="tier1_with_clinical_notes") ```

提供机构：

trivitaai

5,000+

优质数据集

54 个

任务类型

进入经典数据集