Twitter-LDA
收藏researchdata.smu.edu.sg2023-06-02 更新2025-01-15 收录
下载链接:
https://researchdata.smu.edu.sg/articles/dataset/Twitter-LDA/12062730/1
下载链接
链接失效反馈官方服务:
资源简介:
Latent Dirichlet Allocation (LDA) has been widely used in textual analysis. The original LDA is used to find hidden "topics" in the documents, where a topic is a subject like "arts" or "education" that is discussed in the documents. The original setting in LDA, where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets. As experiments in [7] have shown that T-LDA could capture more meaningful topics than LDA in Microblogs.
The original setting in Latent Dirichlet Allocation (LDA), where each word has a topic label, may not work well with Twitter as tweets are short and a single tweet is more likely to talk about one topic. Hence, Twitter-LDA (T-LDA) has been proposed to address this issue. T-LDA also addresses the noisy nature of tweets, where it captures background words in tweets.Related Publication: Zhao, W. X., Jiang, J., Weng, J., He, J., Lim, E. P., Yan, H., & Li, X. (2011). Comparing twitter and traditional media using topic models. In Advances in Information Retrieval (pp. 338-349). http://doi.org/10.1007/978-3-642-20161-5_34
潜在狄利克雷分配(Latent Dirichlet Allocation,简称LDA)在文本分析领域得到了广泛的应用。原始的LDA旨在挖掘文档中的潜在“主题”,其中主题是指诸如“艺术”或“教育”等在文档中讨论的特定学科。在原始的LDA设定中,每个词汇均被赋予一个主题标签,但这一设置可能并不适用于Twitter,因为推文通常较短,且单条推文更倾向于讨论单一主题。因此,为了解决这一问题,提出了Twitter-LDA(T-LDA)。T-LDA还解决了推文的噪声特性问题,能够捕捉推文中的背景词汇。如文献[7]中的实验所示,T-LDA在处理微博数据时,能够捕捉到比LDA更具意义的主题。原始的潜在狄利克雷分配(LDA)设定中,每个词汇均带有主题标签,这一设置与Twitter的使用场景可能存在不匹配,因为推文篇幅较短,且单条推文通常聚焦于单一主题。鉴于此,提出了Twitter-LDA(T-LDA)以应对此问题。T-LDA同时解决了推文的噪声特性,能够捕捉推文中的背景词汇。相关出版物:赵伟兴,蒋佳,王文杰,何建斌,林培坚,严浩,李晓光.(2011).基于主题模型的Twitter与传统媒体比较研究.在《信息检索进展》(Advances in Information Retrieval,pp. 338-349).http://doi.org/10.1007/978-3-642-20161-5_34
提供机构:
SMU Research Data Repository (RDR)



