SMS Fraud Classification dataset for Chichewa
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14607453
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains 676 SMSs in Chichewa and it was used to experiment with machine learning models for fraud classifciation. There are in total six version of the dataset: D-CHI contains SMSs in Chichewa, D-HT contains a human translated version of D-CHI, and D-MT is a machine translation using google translation of D-CHI. These datasets are all balanced: they contain an equal number of fraudulent and normal SMSs. Three extended datasets of 148 SMSs each was also used that contained only normal SMSs. When added to the three datasets we obtained extended unbalance versions demoted as D-CHIe, D-HTe and D-MTe.
The attached paper explains the methodology used.
Please note that the github repo and this dataset are private but wil be made public with the publication of the results from the dataset, we expect this to happen in the next few months.
In the meantime, if you are interested in working with this dataset please contact us.
创建时间:
2025-01-07



