five

A Prompt-Driven LLM Pipeline for Topic Modeling of Multiple Sclerosis Social Media Posts

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/prompt-driven-llm-pipeline-topic-modeling-multiple-sclerosis-social-media-posts
下载链接
链接失效反馈
官方服务:
资源简介:
Social media platforms such as Platform X enable people with multiple sclerosis to share experiences and coping strategies, creating opportunities to analyze their perspectives through natural language processing. However, the short and noisy nature of these texts remains a challenge.  This study introduces and evaluates a prompt-driven Large Language Model (LLM) pipeline for topic modeling of unstructured social media data. Additionally, the performances of two prompt-based learning approaches (zero-shot and few-shot) are assessed. This research aims to highlight human-centered insights that the LLM pipeline can uncover. For the topic modeling task, GPT-4o-mini is prompted with a dataset of 504 posts collected from Platform X. Subsequently, GPT-4 serves as an expert-level evaluator to assess the quality of the generated topics based on coherence and diversity. To ensure the effectiveness of the proposed approach, a human-based evaluation was conducted. Finally, results are compared against the BERTopic baseline. Few-shot prompting achieved the highest performance (coherence=4.9\/5; human agreement=87.5\\%), followed by zero-shot prompting (coherence=5.0; human agreement=79.2\\%). Both LLM approaches scored higher in diversity (4.6\/5) than BERTopic (4.0\/5), which had lower human agreement (45.6\\%). Prompt-based LLM topic modeling outperforms BERTopic for short, informal multiple sclerosis (MS) social media texts, offering greater interpretability and alignment with human judgment.
提供机构:
Yasmeen aLAMOUDI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作