MedSwin/HealthBench-Curated

Name: MedSwin/HealthBench-Curated
Creator: MedSwin
Published: 2026-03-22 06:12:46
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/MedSwin/HealthBench-Curated

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering language: - en - pt - ru tags: - medical size_categories: - 1K<n<10K --- # HealthBench Distilled Curation **HealthBench Curation** is a standardised version of the original HealthBench dataset. It transforms multilingual, complex-formatted clinical QA into **consistent English plaintext** to enable fair benchmarking for lightweight and English-centric LLMs. • [Original Corpus](https://huggingface.co/datasets/MedSwin/HealthBench-Curated/blob/main/healthbench.jsonl). • [Processed Corpus](https://huggingface.co/datasets/MedSwin/HealthBench-Curated/blob/main/healthbench_processed_6.jsonl) (latest v6). • [Curation Script](https://huggingface.co/datasets/MedSwin/HealthBench-Curated/blob/main/prep_healthbench.py). ----- ## Why This Curation? Standard medical benchmarks often contain mixed languages, heavy Markdown, and verbose tables. While large models handle these easily, they introduce **systematic bias** against smaller models (≤20B parameters). This curation removes formatting "noise" to focus purely on **medical reasoning and knowledge**. * **Language Parity:** Translates all non-English samples into clinical-grade English. * **Format Neutrality:** Converts tables and Markdown into simplified plaintext. * **Complexity Control:** Compresses long context to **75–150 words** while preserving 100% of clinical facts. * **Determinism:** Processed using GPT-5-Nano with Temperature 0.0 for reproducible inputs. ----- ## Methodology The pipeline processes the `prompt` and `ideal_completion` fields independently using the following logic: ### 1\. Translation & Localization * Detects non-English content and translates it faithfully. * **Preserves:** Clinical terminology, units, dosage, and temporal relationships. ### 2\. Format Flattening | Original Format | Transformation | | :--- | :--- | | **Markdown Tables** | Converted to descriptive bullet points | | **Headers/HTML** | Stripped for raw plaintext | | **Nested Lists** | Flattened to single-level bullets | ### 3\. Semantic Compression * **Constraint:** No loss of medical facts or causal logic. * **Target:** Concise 75–150 word windows to fit smaller KV caches. ----- ## Dataset Structure The curated file (`healthbench_processed.jsonl`) mirrors the original schema but appends standardized fields: ```json { "prompt": [...], "ideal_completions_data": { "ideal_completion": "..." }, "processed_prompt_en_plaintext": "Standardized English prompt...", "processed_ideal_completion_en_plaintext": "Standardized English answer...", "preprocessing_meta": { "target_word_range": [75, 150], "processor": "azure_gpt5nano_single_field" } } ``` ----- ## Usage & Limitations ### Best For * **Lightweight Models:** Models (3B–20B) that struggle with high token complexity. * **Reasoning Benchmarks:** Testing "what" a model knows, not how well it parses Markdown. * **Quantisation Testing:** Measuring how bit-reduction affects clinical accuracy in a stable environment. ### Limitations * Structural nuances found in tables may be simplified. * Stylistic richness is traded for factual density. ----- ## License & Attribution * **License:** Apache 2.0 * **Derived from:** OpenAI HealthBench. ----- > Note: This dataset has multiple versions; the later the version, the more complete and accurate the data has been attempted from LLM-distillation.

提供机构：

MedSwin

5,000+

优质数据集

54 个

任务类型

进入经典数据集