five

A supervised machine learning method to classify Dutch-language news items

收藏
Figshare2019-12-12 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/A_supervised_machine_learning_method_to_classify_Dutch-language_news_items/7314896/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Please contact s.a.m.vermeer@uva.nl for questions or further information. </b><br><br><i>Background</i>Based on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about <i>internal politics, international politics, </i>and<i> military </i>and <i>defense; </i>(2) Business, includes <i>economy, education, and health, welfare and social services; </i>(3) Entertainment, covers <i>sports, culture, fashion </i>and <i>human interest; </i>and (4) Other, includes <i>science and technology, environment, communication, weather </i>and<i> religion and beliefs. </i><i><br></i><i>Performance</i>We used three different pre-processing steps, resulting in three different <b>.pkl </b>modules: (1) All text: '<i>...text_Dutch_news.pkl</i>', (2) Stop word removal: '...<i>stopword_Dutch_news.pkl</i>', and (3) Lead: '...<i>lead_Dutch_news.pkl</i>'.For every text category, the classifier reached an accuracy, precision and recall of at least .<b>81</b>. <br><i>Usage</i>The classifiers have been developed in Python 3.5.2, and scikit-learn 0.19.2, and can be used as follows:-- clf=joblib.load('PassiveAggressive_text_Dutch_news.pkl')-- topic=clf.predict([text]) #text is a news item<br>Susan VermeerDamian TrillingSanne KruikemeierClaes de Vreese
创建时间:
2018-11-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作