five

SYNUR

收藏
魔搭社区2026-01-07 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/microsoft/SYNUR
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card: SYNUR (Synthetic Nursing Observation Dataset) ## 1. Dataset Summary - **Name**: SYNUR - **Full name / acronym**: SYnthetic NURsing Observation Extraction - **Purpose / use case**: SYNUR is intended to support research in structuring nurse dictation transcripts by extracting clinical observations that can feed into flowsheet-style EHR entries. It is designed to reduce documentation burden by enabling automated conversion from spoken nurse assessments to structured observations. ([arxiv.org](https://arxiv.org/pdf/2507.05517)) - **Version**: As released with the EMNLP industry track paper (2025) - **License / usage terms**: cdla-permissive-2.0 ## 2. Data Fields / Format - `transcript`: string, the nurse dictation (raw spoken text) - `observations`: JSON dumped of list of dictionaries with following format - `id` (str): key of observation in schema. - `value_type` (str): type of observation in {*SINGLE_SELECT*, *MULTI_SELECT*, *STRING*, *NUMERIC*}. - `name` (str): observation concept name. - `value` (any): value of observation. ## 3. Observation Schema The full schema (i.e., 193 observation concepts) is provided at the root of this dataset repo as `synur_schema.json`. It is a list of dictionaries with the following key-value pairs: - `id` (str): key of observation concept. - `name` (str): observation concept name. - `value_type` (str): type of observation in {*SINGLE_SELECT*, *MULTI_SELECT*, *STRING*, *NUMERIC*}. - `value_enum` (List[str], *optional*): set of possible string values for *SINGLE_SELECT* and *MULTI_SELECT* value types. ## 4. Contact - **Maintainers**: {jcorbeil,georgemi}@microsoft.com ## 5. Citation If you use this dataset, please cite the paper: @inproceedings{corbeil-etal-2025-empowering, title = "Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications", author = "Corbeil, Jean-Philippe and Ben Abacha, Asma and Michalopoulos, George and Swazinna, Phillip and Del-Agua, Miguel and Tremblay, Jerome and Daniel, Akila Jeeson and Bader, Cari and Cho, Kevin and Krishnan, Pooja and Bodenstab, Nathan and Lin, Thomas and Teng, Wenxuan and Beaulieu, Francois and Vozila, Paul", editor = "Potdar, Saloni and Rojas-Barahona, Lina and Montella, Sebastien", booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track", month = nov, year = "2025", address = "Suzhou (China)", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.emnlp-industry.58/", doi = "10.18653/v1/2025.emnlp-industry.58", pages = "859--870", ISBN = "979-8-89176-333-3" }

# 数据集卡片:SYNUR(合成护理观察数据集) ## 1. 数据集概述 - **名称**:SYNUR - **全称/缩写**:SYnthetic NURsing Observation Extraction - **用途/应用场景**: SYNUR旨在支持护士口述转录文本的结构化研究工作,通过提取可用于表单式电子病历(Electronic Health Record, EHR)条目的临床观察项,实现将护士口述评估自动转换为结构化观察数据,从而减轻护理文档编制负担。([arxiv.org](https://arxiv.org/pdf/2507.05517)) - **版本**:随2025年EMNLP产业赛道论文同步发布 - **许可证/使用条款**:cdla-permissive-2.0 ## 2. 数据字段/格式 - `transcript`:字符串类型,即护士口述的原始语音转写文本 - `observations`:列表字典的JSON序列化结果,每个字典包含以下字段: - `id`(字符串):观察项在模式中的键 - `value_type`(字符串):观察项类型,可选值为{*SINGLE_SELECT(单选)*, *MULTI_SELECT(多选)*, *STRING(字符串)*, *NUMERIC(数值)*} - `name`(字符串):观察项概念名称 - `value`(任意类型):观察项的取值 ## 3. 观察项模式 完整的模式定义(包含193个观察项概念)存放在本数据集仓库的根目录下的`synur_schema.json`文件中,为字典列表格式,包含以下键值对: - `id`(字符串):观察项概念的键 - `name`(字符串):观察项概念名称 - `value_type`(字符串):观察项类型,可选值为{*SINGLE_SELECT(单选)*, *MULTI_SELECT(多选)*, *STRING(字符串)*, *NUMERIC(数值)*} - `value_enum`(字符串列表,可选):仅当`value_type`为SINGLE_SELECT或MULTI_SELECT时,为该观察项的可选字符串取值集合 ## 4. 联系方式 - **维护者**:{jcorbeil,georgemi}@microsoft.com ## 5. 引用说明 若您使用本数据集,请引用以下论文: bibtex @inproceedings{corbeil-etal-2025-empowering, title = "Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications", author = "Corbeil, Jean-Philippe and Ben Abacha, Asma and Michalopoulos, George and Swazinna, Phillip and Del-Agua, Miguel and Tremblay, Jerome and Daniel, Akila Jeeson and Bader, Cari and Cho, Kevin and Krishnan, Pooja and Bodenstab, Nathan and Lin, Thomas and Teng, Wenxuan and Beaulieu, Francois and Vozila, Paul", editor = "Potdar, Saloni and Rojas-Barahona, Lina and Montella, Sebastien", booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track", month = nov, year = "2025", address = "Suzhou (China)", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.emnlp-industry.58/", doi = "10.18653/v1/2025.emnlp-industry.58", pages = "859--870", ISBN = "979-8-89176-333-3" }
提供机构:
maas
创建时间:
2025-10-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作