five

Indonesian News Corpus

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/2zpbjs22k3
下载链接
链接失效反馈
官方服务:
资源简介:
This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesian news website. The corpus is designated for research purpose only. The news websites are: • kompas.com is a registered trademark of PT. Kompas Cyber Media. https://inside.kompas.com/about-us • tempo.co is a registered trademark of PT INFO MEDIA DIGITAL. https://www.tempo.co/about • merdeka.com is a registered trademark of PT KAPAN LAGI DOT COM NETWORKS. https://www.merdeka.com/company/tentang-kami.html • republika.co.id is a registered trademark of PT Republika Media Mandiri. https://www.republika.co.id/page/about • viva.co.id is a registered trademark of PT. Viva Media Baru. https://www.viva.co.id/tentang-kami • tribunnews.com is a registered trademark of PT Tribun Digital Online. http://www.tribunnews.com/about-us The corpus is a part of bachelor thesis work of Aad Miqdad Muadz Muzad under the supervision of Faisal Rahutomo. We crawled several categories of the websites for 6 months from July 2015 until December 2015.
创建时间:
2018-08-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作