SYNUR
收藏魔搭社区2026-01-07 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/microsoft/SYNUR
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card: SYNUR (Synthetic Nursing Observation Dataset)
## 1. Dataset Summary
- **Name**: SYNUR
- **Full name / acronym**: SYnthetic NURsing Observation Extraction
- **Purpose / use case**:
SYNUR is intended to support research in structuring nurse dictation transcripts by extracting clinical observations that can feed into flowsheet-style EHR entries. It is designed to reduce documentation burden by enabling automated conversion from spoken nurse assessments to structured observations. ([arxiv.org](https://arxiv.org/pdf/2507.05517))
- **Version**: As released with the EMNLP industry track paper (2025)
- **License / usage terms**: cdla-permissive-2.0
## 2. Data Fields / Format
- `transcript`: string, the nurse dictation (raw spoken text)
- `observations`: JSON dumped of list of dictionaries with following format
- `id` (str): key of observation in schema.
- `value_type` (str): type of observation in {*SINGLE_SELECT*, *MULTI_SELECT*, *STRING*, *NUMERIC*}.
- `name` (str): observation concept name.
- `value` (any): value of observation.
## 3. Observation Schema
The full schema (i.e., 193 observation concepts) is provided at the root of this dataset repo as `synur_schema.json`. It is a list of dictionaries with the following key-value pairs:
- `id` (str): key of observation concept.
- `name` (str): observation concept name.
- `value_type` (str): type of observation in {*SINGLE_SELECT*, *MULTI_SELECT*, *STRING*, *NUMERIC*}.
- `value_enum` (List[str], *optional*): set of possible string values for *SINGLE_SELECT* and *MULTI_SELECT* value types.
## 4. Contact
- **Maintainers**: {jcorbeil,georgemi}@microsoft.com
## 5. Citation
If you use this dataset, please cite the paper:
@inproceedings{corbeil-etal-2025-empowering,
title = "Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications",
author = "Corbeil, Jean-Philippe and
Ben Abacha, Asma and
Michalopoulos, George and
Swazinna, Phillip and
Del-Agua, Miguel and
Tremblay, Jerome and
Daniel, Akila Jeeson and
Bader, Cari and
Cho, Kevin and
Krishnan, Pooja and
Bodenstab, Nathan and
Lin, Thomas and
Teng, Wenxuan and
Beaulieu, Francois and
Vozila, Paul",
editor = "Potdar, Saloni and
Rojas-Barahona, Lina and
Montella, Sebastien",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track",
month = nov,
year = "2025",
address = "Suzhou (China)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-industry.58/",
doi = "10.18653/v1/2025.emnlp-industry.58",
pages = "859--870",
ISBN = "979-8-89176-333-3"
}
# 数据集卡片:SYNUR(合成护理观察数据集)
## 1. 数据集概述
- **名称**:SYNUR
- **全称/缩写**:SYnthetic NURsing Observation Extraction
- **用途/应用场景**:
SYNUR旨在支持护士口述转录文本的结构化研究工作,通过提取可用于表单式电子病历(Electronic Health Record, EHR)条目的临床观察项,实现将护士口述评估自动转换为结构化观察数据,从而减轻护理文档编制负担。([arxiv.org](https://arxiv.org/pdf/2507.05517))
- **版本**:随2025年EMNLP产业赛道论文同步发布
- **许可证/使用条款**:cdla-permissive-2.0
## 2. 数据字段/格式
- `transcript`:字符串类型,即护士口述的原始语音转写文本
- `observations`:列表字典的JSON序列化结果,每个字典包含以下字段:
- `id`(字符串):观察项在模式中的键
- `value_type`(字符串):观察项类型,可选值为{*SINGLE_SELECT(单选)*, *MULTI_SELECT(多选)*, *STRING(字符串)*, *NUMERIC(数值)*}
- `name`(字符串):观察项概念名称
- `value`(任意类型):观察项的取值
## 3. 观察项模式
完整的模式定义(包含193个观察项概念)存放在本数据集仓库的根目录下的`synur_schema.json`文件中,为字典列表格式,包含以下键值对:
- `id`(字符串):观察项概念的键
- `name`(字符串):观察项概念名称
- `value_type`(字符串):观察项类型,可选值为{*SINGLE_SELECT(单选)*, *MULTI_SELECT(多选)*, *STRING(字符串)*, *NUMERIC(数值)*}
- `value_enum`(字符串列表,可选):仅当`value_type`为SINGLE_SELECT或MULTI_SELECT时,为该观察项的可选字符串取值集合
## 4. 联系方式
- **维护者**:{jcorbeil,georgemi}@microsoft.com
## 5. 引用说明
若您使用本数据集,请引用以下论文:
bibtex
@inproceedings{corbeil-etal-2025-empowering,
title = "Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications",
author = "Corbeil, Jean-Philippe and
Ben Abacha, Asma and
Michalopoulos, George and
Swazinna, Phillip and
Del-Agua, Miguel and
Tremblay, Jerome and
Daniel, Akila Jeeson and
Bader, Cari and
Cho, Kevin and
Krishnan, Pooja and
Bodenstab, Nathan and
Lin, Thomas and
Teng, Wenxuan and
Beaulieu, Francois and
Vozila, Paul",
editor = "Potdar, Saloni and
Rojas-Barahona, Lina and
Montella, Sebastien",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track",
month = nov,
year = "2025",
address = "Suzhou (China)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-industry.58/",
doi = "10.18653/v1/2025.emnlp-industry.58",
pages = "859--870",
ISBN = "979-8-89176-333-3"
}
提供机构:
maas
创建时间:
2025-10-09



