Medical Kurdish Dataset (MKD)
收藏arXiv2022-03-27 更新2024-06-21 收录
下载链接:
https://dx.doi.org/10.17632/f2yfz4r9fr.1
下载链接
链接失效反馈官方服务:
资源简介:
Medical Kurdish Dataset (MKD)是由哈拉布贾大学计算机科学系创建的一个专门用于医疗领域文本分类的数据集。该数据集包含6756条来自社交媒体的库尔德语短文本评论,主要收集自Facebook的不同页面如医疗、新闻、经济、教育和体育。数据集通过六个预处理步骤进行清洗,包括去除噪声、替换字符等,以提高数据质量。MKD数据集主要用于机器学习和自然语言处理领域,特别是针对库尔德语的医疗文本分类研究,有助于支持患者健康系统、健康政策和法规的建模与分析。
Medical Kurdish Dataset (MKD) is a specialized dataset for medical domain text classification, created by the Department of Computer Science at Halabja University. It contains 6756 short Kurdish text comments sourced from social media, primarily collected from various Facebook pages covering healthcare, news, economy, education and sports. The dataset has been cleaned through six preprocessing steps including noise removal and character replacement to improve data quality. The MKD dataset is mainly utilized in the fields of machine learning and natural language processing, especially for Kurdish medical text classification research, which helps support the modeling and analysis of patient health systems, health policies and regulations.
提供机构:
计算机科学系,哈拉布贾大学,库尔德斯坦,伊拉克
创建时间:
2022-03-27



