five

Reproducible Text Analysis with Topic Modeling

收藏
PsychArchives2022-11-03 更新2026-04-25 收录
下载链接:
https://hdl.handle.net/20.500.12034/7665
下载链接
链接失效反馈
官方服务:
资源简介:
Topic Modeling is a popular text mining method for finding the central topics in large collections of texts. In this process, an algorithm identifies groups of words that frequently occur together in the texts. These groups of words are called "topics". Since text collections of any size can thus be evaluated automatically, topic modeling can be an insightful tool for various text-based applications, such as social media studies or psychotherapy research. Even though Topic Modeling is an "unsupervised machine learning" technique, many parameter decisions have to be made by the person doing the analysis. Since these decisions can have strong effects on the results and are partly based on random numbers, good documentation and freely available analysis code are crucial for reproducible Topic Modeling. In this introductory demonstration, the established topic modeling variant "Latent Dirichlet Allocation" is presented and applied to a freely available dataset. Special emphasis is placed on topic validity and topic reliability - two often overlooked but important model properties. An example is used to show how transparent and detailed code can make the analysis reproducible. A brief introduction to PsychTopics (psychtopics.org), ZPID's open-source tool for exploring psychological research topics and trends, is also provided. This uses a novel topic modeling approach to dynamically identify topics in psychological publications and interactively display them in an R Shiny app. These are the slides for the topic modeling demonstration in the "Practices of Open Science" Lecture series. Find more information here: https://leibniz-psychology.org/en/opensciencelectures/topic-modeling/ unknown
提供机构:
PsychArchives
创建时间:
2022-11-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作