Source code and data for the PhD Thesis "On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints"
收藏DataCite Commons2026-04-21 更新2026-05-07 收录
下载链接:
https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/DATA/USQLMB
下载链接
链接失效反馈官方服务:
资源简介:
<h2>Dataset overview</h2>
<p>
This dataset contains source code and annotation guidelines used in the PhD thesis:
</p>
<p>
“On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints”
</p>
<h3>Repository structure</h3>
<p>The dataset is split into five repositories:</p>
<ul>
<li>Source code for Chapter 2.6 <em>De-identification of German doctor’s letters</em></li>
<li>Source code for Chapter 5 <em>Clinical Section Classification using Pretrained Language Models and Prompting</em></li>
<li>Source code for Chapter 6 <em>Medication Information Extraction using Local Large Language Models</em></li>
<li>Source code for Chapter 7<em>Clinical Application: Medication Trends and Polypharmacy</em></li>
<li>Annotation guidelines for Chapters 2.6, 4, 5, and 7</li>
</ul>
<h3>CARDIO:DE</h3>
<p>
The main dataset used for experiments in Chapters 5, 6, and 7:
</p>
<ul>
<li>
CARDIO:DE -
<a href="https://doi.org/10.11588/DATA/AFYQDY">https://doi.org/10.11588/DATA/AFYQDY</a>
</li>
</ul>
<h3>Additional datasets (not included here)</h3>
<p>Other datasets used include:</p>
<ul>
<li>
n2c2 2018 Track 2 (used in Chapter 6) -
<a href="https://doi.org/10.1093/jamia/ocz166">https://doi.org/10.1093/jamia/ocz166</a>
</li>
</ul>
<h3>Notes on additional data and model availability</h3>
<p>
Doctor’s letters from the cardiology domain used in Chapters 2, 5, 6, and 7 (except for CARDIO:DE) and all further-pretrained and finetuned models cannot be distributed due to data protection regulations.
</p>
提供机构:
heiDATA
创建时间:
2026-04-14



