leduckhai/VietMed-Sum

Name: leduckhai/VietMed-Sum
Creator: leduckhai
Published: 2024-07-04 17:50:05
License: 暂无描述

Hugging Face2024-07-04 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/leduckhai/VietMed-Sum

下载链接

链接失效反馈

官方服务：

资源简介：

VietMed-Sum数据集是第一个用于医疗对话的语音摘要数据集。该数据集用于实时语音摘要系统，能够在对话中每N个语音话语后生成局部摘要，并在对话结束后生成全局摘要。数据集的目标是提升用户体验并减少计算成本。此外，数据集还首次结合了LLM（大语言模型）和人类注释者来创建医疗对话摘要的金标准和合成摘要。

The VietMed-Sum dataset is the first speech summarization dataset for medical conversations. It is used in a real-time speech summarization system that generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. The dataset aims to enhance user experience and reduce computational costs. Additionally, it is the first to utilize LLM (Large Language Models) and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization.

提供机构：

leduckhai

原始信息汇总

Real-time Speech Summarization for Medical Conversations

描述

任务类别: 摘要生成
语言: 越南语、英语
数据集: VietMed-Sum
特点:
- 首个可部署的实时语音摘要系统，适用于工业界的实际应用。
- 在每次对话中，每N个语音片段后生成局部摘要，对话结束后生成全局摘要。
- 从商业角度提升用户体验，从技术角度降低计算成本。
- 首个用于医疗对话的语音摘要数据集。
- 首次利用大型语言模型（LLM）和人工标注者协作创建医疗对话摘要的金标准和合成摘要。

引用

@article{VietMed_Sum, title={Real-time Speech Summarization for Medical Conversations}, author={Le-Duc, Khai and Nguyen, Khai-Nguyen and Vo-Dang, Long and Hy, Truong-Son}, journal={arXiv preprint arXiv:2406.15888}, booktitle={Interspeech 2024}, url={https://arxiv.org/abs/2406.15888}, year={2024} }

联系

核心开发者:
- Khai Le-Duc
  - 机构: 多伦多大学，加拿大
  - 邮箱: duckhai.le@mail.utoronto.ca
  - GitHub: https://github.com/leduckhai
- Khai-Nguyen Nguyen
  - 机构: 威廉与玛丽学院，美国
  - 邮箱: knguyen07@wm.edu
  - GitHub: https://github.com/nkn002

5,000+

优质数据集

54 个

任务类型

进入经典数据集