A Prompt-Driven LLM Pipeline for Topic Modeling of Multiple Sclerosis Social Media Posts

Name: A Prompt-Driven LLM Pipeline for Topic Modeling of Multiple Sclerosis Social Media Posts
Creator: Yasmeen aLAMOUDI
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/prompt-driven-llm-pipeline-topic-modeling-multiple-sclerosis-social-media-posts

下载链接

链接失效反馈

官方服务：

资源简介：

Social media platforms such as Platform X enable people with multiple sclerosis to share experiences and coping strategies, creating opportunities to analyze their perspectives through natural language processing. However, the short and noisy nature of these texts remains a challenge.  This study introduces and evaluates a prompt-driven Large Language Model (LLM) pipeline for topic modeling of unstructured social media data. Additionally, the performances of two prompt-based learning approaches (zero-shot and few-shot) are assessed. This research aims to highlight human-centered insights that the LLM pipeline can uncover. For the topic modeling task, GPT-4o-mini is prompted with a dataset of 504 posts collected from Platform X. Subsequently, GPT-4 serves as an expert-level evaluator to assess the quality of the generated topics based on coherence and diversity. To ensure the effectiveness of the proposed approach, a human-based evaluation was conducted. Finally, results are compared against the BERTopic baseline. Few-shot prompting achieved the highest performance (coherence=4.9\/5; human agreement=87.5\\%), followed by zero-shot prompting (coherence=5.0; human agreement=79.2\\%). Both LLM approaches scored higher in diversity (4.6\/5) than BERTopic (4.0\/5), which had lower human agreement (45.6\\%). Prompt-based LLM topic modeling outperforms BERTopic for short, informal multiple sclerosis (MS) social media texts, offering greater interpretability and alignment with human judgment.

提供机构：

Yasmeen aLAMOUDI

5,000+

优质数据集

54 个

任务类型

进入经典数据集