ohsumed-single
收藏数据集概述:joao-luz/ohsumed-single
数据集基本信息
- 来源:基于Ohsumed数据集的改编版本,移除了原始语料中属于多个疾病类别的记录。
- 原始数据来源:https://github.com/yao8839836/text_gcn
- 类别标签参考:https://github.com/Evgeneus/screening-classification-datasets/blob/master/ohsumed-based-screening-datasets/README.md
数据集结构
- 特征:
text:文本内容,数据类型为字符串。label:类别标签,包含23个疾病类别。
- 数据划分:
- 训练集:
- 样本数量:3,357
- 数据大小:4,302,749字节
- 测试集:
- 样本数量:4,043
- 数据大小:5,207,699字节
- 训练集:
- 总下载大小:5,084,973字节
- 总数据集大小:9,510,448字节
类别标签详情
| 标签 | 原始类别 | 名称 |
|---|---|---|
| 0 | C01 | Bacterial Infections and Mycoses |
| 1 | C02 | Virus Diseases |
| 2 | C03 | Parasitic Diseases |
| 3 | C04 | Neoplasms |
| 4 | C05 | Musculoskeletal Diseases |
| 5 | C06 | Digestive System Diseases |
| 6 | C07 | Stomatognathic Diseases |
| 7 | C08 | Respiratory Tract Diseases |
| 8 | C09 | Otorhinolaryngologic Diseases |
| 9 | C10 | Nervous System Diseases |
| 10 | C11 | Eye Diseases |
| 11 | C12 | Urologic and Male Genital Diseases |
| 12 | C13 | Female Genital Diseases and Pregnancy Complications |
| 13 | C14 | Cardiovascular Diseases |
| 14 | C15 | Hemic and Lymphatic Diseases |
| 15 | C16 | Neonatal Diseases and Abnormalities |
| 16 | C17 | Skin and Connective Tissue Diseases |
| 17 | C18 | Nutritional and Metabolic Diseases |
| 18 | C19 | Endocrine Diseases |
| 19 | C20 | Immunologic Diseases |
| 20 | C21 | Disorders of Environmental Origin |
| 21 | C22 | Animal Diseases |
| 22 | C23 | Pathological Conditions, Signs and Symptoms |
引用信息
bib @InProceedings{10.1007/BFb0026683, author="Joachims, Thorsten", editor="N{e}dellec, Claire and Rouveirol, C{e}line", title="Text categorization with Support Vector Machines: Learning with many relevant features", booktitle="Machine Learning: ECML-98", year="1998", publisher="Springer Berlin Heidelberg", address="Berlin, Heidelberg", pages="137--142", abstract="This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.", isbn="978-3-540-69781-7" }




