knkarthick/highlightsum
收藏数据集卡片 for HighlightSum Corpus
数据集描述
数据集概述
HighlightSUM 是一个大规模的对话摘要数据集,包含来自 AMI、SamSUM 和 DialogSUM 的 31,108 个对话及其对应的手动标注摘要。
语言
英语
数据集结构
数据实例
HighlightSum 是一个大规模的对话摘要数据集集合,包含 31,108 个对话,分为训练集、测试集和验证集。
训练集中的第一个实例: json { "id": "train_0", "summary": "Mr. Smiths getting a check-up, and Doctor Hawkins advises him to have one every year. Hawkinsll give some information about their classes and medications to help Mr. Smith quit smoking.", "dialogue": "#Person1#: Hi, Mr. Smith. Im Doctor Hawkins. Why are you here today? #Person2#: I found it would be a good idea to get a check-up. #Person1#: Yes, well, you havent had one for 5 years. You should have one every year. #Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor? #Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good. #Person2#: Ok. #Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith? #Person2#: Yes. #Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit. #Person2#: Ive tried hundreds of times, but I just cant seem to kick the habit. #Person1#: Well, we have classes and some medications that might help. Ill give you more information before you leave. #Person2#: Ok, thanks doctor." }
数据字段
- dialogue: 对话文本。
- summary: 人工编写的对话摘要。
- id: 示例的唯一文件ID。
数据分割
- train: 27401
- val: 1360
- test: 2347
数据集创建
数据集创建理由
收集 AMI、SamSUM 和 DialogSUM 数据集。
源语言生产者
语言学家
标注者
语言专家
许可信息
非商业许可:MIT
引用信息
请参考上述链接获取信用和引用信息。



