Tamil News Dataset
收藏www.kaggle.com2019-12-25 更新2025-01-09 收录
下载链接:
https://www.kaggle.com/disisbig/tamil-news-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This data set contains ~6500 news articles which were collected by [Ravi](https://github.com/ravi-annaswamy) from Tamil news websites.
The data set has been cleaned and contains train and test set using which you can benchmark your classification models in Tamil
The scripts which were used to create the data set can be found [here](https://github.com/goru001/nlp-for-tamil/tree/master/dataset-preparation)
**Credits:**
Full credit to [Ravi](https://github.com/ravi-annaswamy) for this Data set. Also, Thanks to thetamilhindu headline crawler built using news crawler from [vanangamudi](https://github.com/vanangamudi)
本数据集汇集了约6500篇新闻文章,由[Ravi](https://github.com/ravi-annaswamy)从泰米尔语新闻网站收集而来。数据集经过清洗,并包含了训练集和测试集,可用于对您的分类模型进行泰米尔语基准测试。构建数据集所使用的脚本可在[此处](https://github.com/goru001/nlp-for-tamil/tree/master/dataset-preparation)找到。
**致谢:**对[Ravi](https://github.com/ravi-annaswamy)提供此数据集表示衷心的感谢。同时,对使用[vanangamudi](https://github.com/vanangamudi)开发的新闻爬虫构建的thetamilhindu标题爬虫表示感谢。
提供机构:
Kaggle



