A supervised machine learning method to classify Dutch-language news items
收藏Figshare2018-11-08 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/A_supervised_machine_learning_method_to_classify_Dutch-language_news_items/7314896
下载链接
链接失效反馈官方服务:
资源简介:
Please contact s.a.m.vermeer@uva.nl for questions or further information. BackgroundBased on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about internal politics, international politics, and military and defense; (2) Business, includes economy, education, and health, welfare and social services; (3) Entertainment, covers sports, culture, fashion and human interest; and (4) Other, includes science and technology, environment, communication, weather and religion and beliefs. PerformanceWe used three different pre-processing steps, resulting in three different .pkl modules: (1) All text: '...text_Dutch_news.pkl', (2) Stop word removal: '...stopword_Dutch_news.pkl', and (3) Lead: '...lead_Dutch_news.pkl'.For every text category, the classifier reached an accuracy, precision and recall of at least .81. UsageThe classifiers have been developed in Python 3.5.2, and scikit-learn 0.19.2, and can be used as follows:-- clf=joblib.load('PassiveAggressive_text_Dutch_news.pkl')-- topic=clf.predict([text]) #text is a news itemSusan VermeerDamian TrillingSanne KruikemeierClaes de Vreese
创建时间:
2018-11-08



