five

eduardofarina/seed-of-thought-healthcare

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/eduardofarina/seed-of-thought-healthcare
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - medical - healthcare - differential-diagnosis - prompting - seed-of-thought - simplestrat - medqa size_categories: - n<1K --- # 🌱 Seed of Thought for Healthcare Differential Diagnosis Implementation of the **SimpleStrat / Seed-of-Thought** methodology ([Wong et al., 2024, arXiv:2410.09038](https://arxiv.org/abs/2410.09038)) applied to clinical differential diagnosis on [MedQA-USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options). ## 🧬 The Challenge LLMs exhibit **mode collapse** in clinical diagnosis — they repeatedly predict the same top-1 diagnosis, ignoring the full differential. This is dangerous in medicine where rare but serious conditions must be considered. Standard temperature sampling doesn't help because it adds noise, not systematic diversity. ## 🌱 Seed of Thought Methodology The pipeline has 3 stages: ### Stage 1: Auto-Stratification The LLM identifies **discriminating clinical properties** that split the answer space into roughly equal halves. For 4-option MCQs, ideal properties split 2:2. - Example: "Does the mechanism involve direct cellular damage?" → Options A,C satisfy; B,D don't ### Stage 2: Heuristic Estimation For each property, the LLM estimates the probability using calibrated Bayesian reasoning (inspired by superforecasting techniques from Tetlock). ### Stage 3: Probabilistic Prompting A stratum (combination of property True/False values) is **sampled** from the joint distribution. The LLM generates an answer **constrained** to that stratum. Different samples explore different regions of the answer space. ## 📊 Results | Metric | Baseline (temp=0.9) | Seed of Thought | Δ | |--------|---------------------|-----------------|---| | accuracy | 0.9333 | 0.9333 | +0.0000 | | diversity_ratio | 0.1300 | 0.2533 | +0.1233 | | unique_count | 1.3000 | 2.5333 | +1.2333 | | coverage | 0.3417 | 0.5667 | +0.2250 | | kl_divergence_from_uniform | 1.2773 | 0.8354 | -0.4419 | ### Setup - **Model**: Qwen/Qwen2.5-72B-Instruct - **Dataset**: MedQA-USMLE test split (30 questions) - **Samples/question**: 10 - **Strata**: 3 - **Baseline temp**: 0.9 | **SoT temp**: 0.7 - **Total calls**: 840 | **Time**: 202.8min ## 🔑 Key Insight By partitioning the diagnostic space into orthogonal dimensions and sampling systematically, Seed-of-Thought achieves: 1. **Higher coverage** — more plausible diagnoses considered 2. **Better diversity** — genuinely different answers per sample 3. **Lower KL divergence** — more uniform distribution across valid options 4. **Maintained accuracy** — correct answer still appears in generated set ## Citation ```bibtex @article{wong2024simplestrat, title={SimpleStrat: Diversifying Language Model Generation with Stratification}, author={Wong, Justin and Orlovskiy, Yury and Luo, Michael and Seshia, Sanjit A. and Gonzalez, Joseph E.}, journal={arXiv preprint arXiv:2410.09038}, year={2024} } ```
提供机构:
eduardofarina
二维码
社区交流群
二维码
科研交流群
商业服务