Indonesian News Corpus
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/2zpbjs22k3
下载链接
链接失效反馈官方服务:
资源简介:
This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesian news website. The corpus is designated for research purpose only. The news websites are:
• kompas.com is a registered trademark of PT. Kompas Cyber Media. https://inside.kompas.com/about-us
• tempo.co is a registered trademark of PT INFO MEDIA DIGITAL. https://www.tempo.co/about
• merdeka.com is a registered trademark of PT KAPAN LAGI DOT COM NETWORKS. https://www.merdeka.com/company/tentang-kami.html
• republika.co.id is a registered trademark of PT Republika Media Mandiri. https://www.republika.co.id/page/about
• viva.co.id is a registered trademark of PT. Viva Media Baru. https://www.viva.co.id/tentang-kami
• tribunnews.com is a registered trademark of PT Tribun Digital Online. http://www.tribunnews.com/about-us
The corpus is a part of bachelor thesis work of Aad Miqdad Muadz Muzad under the supervision of Faisal Rahutomo. We crawled several categories of the websites for 6 months from July 2015 until December 2015.
创建时间:
2018-08-30



