Table 2_Validating the use of large language models for psychological text classification.docx
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Table_2_Validating_the_use_of_large_language_models_for_psychological_text_classification_docx/28456745
下载链接
链接失效反馈官方服务:
资源简介:
Large language models (LLMs) are being used to classify texts into categories informed by psychological theory (“psychological text classification”). However, the use of LLMs in psychological text classification requires validation, and it remains unclear exactly how psychologists should prompt and validate LLMs for this purpose. To address this gap, we examined the potential of using LLMs for psychological text classification, focusing on ways to ensure validity. We employed OpenAI's GPT-4o to classify (1) reported speech in online diaries, (2) other-initiations of conversational repair in Reddit dialogues, and (3) harm reported in healthcare complaints submitted to NHS hospitals and trusts. Employing a two-stage methodology, we developed and tested the validity of the prompts used to instruct GPT-4o using manually labeled data (N = 1,500 for each task). First, we iteratively developed three types of prompts using one-third of each manually coded dataset, examining their semantic validity, exploratory predictive validity, and content validity. Second, we performed a confirmatory predictive validity test on the final prompts using the remaining two-thirds of each dataset. Our findings contribute to the literature by demonstrating that LLMs can serve as valid coders of psychological phenomena in text, on the condition that researchers work with the LLM to secure semantic, predictive, and content validity. They also demonstrate the potential of using LLMs in rapid and cost-effective iterations over big qualitative datasets, enabling psychologists to explore and iteratively refine their concepts and operationalizations during manual coding and classifier development. Accordingly, as a secondary contribution, we demonstrate that LLMs enable an intellectual partnership with the researcher, defined by a synergistic and recursive text classification process where the LLM's generative nature facilitates validity checks. We argue that using LLMs for psychological text classification may signify a paradigm shift toward a novel, iterative approach that may improve the validity of psychological concepts and operationalizations.
大语言模型(Large Language Models,LLMs)正被用于将文本归类于心理学理论所界定的类别中,该任务即“心理文本分类”。然而,在心理文本分类场景中应用大语言模型需要经过严格验证,且目前仍不清楚心理学家应如何为该任务向大语言模型生成提示词并验证其效果。为填补这一研究空白,本研究探讨了将大语言模型用于心理文本分类的潜力,重点关注确保分类有效性的方法。本研究采用OpenAI的GPT-4o模型开展三类分类任务:1)在线日记中的引述言语;2)Reddit对话中由他人发起的会话修复行为;3)提交给英国国民保健制度(National Health Service,NHS)医院及医疗信托机构的医疗投诉中提及的伤害事件。本研究采用两阶段研究方法,利用人工标注数据(每项任务样本量N=1500),开发并验证了用于指导GPT-4o的提示词的有效性。第一阶段,我们从每个人工编码数据集的三分之一样本中,迭代开发了三类提示词,并对其语义有效性、探索性预测效度及内容效度进行了检验。第二阶段,我们利用每个数据集剩余的三分之二样本,针对最终确定的提示词开展了验证性预测效度检验。本研究结果为相关领域文献作出了补充:研究表明,只要研究人员与大语言模型协作以确保语义、预测及内容效度,大语言模型便可作为文本中心理现象的有效编码工具。同时,本研究也证明了大语言模型可用于对大规模质性数据集开展快速且经济高效的迭代分析,使心理学家在人工编码与分类器开发过程中,能够探索并迭代优化其研究概念与操作化定义。据此,本研究的次要贡献在于证明了大语言模型可与研究人员形成智力协作伙伴关系:该关系以协同且递归的文本分类流程为核心,大语言模型的生成式特性可辅助完成效度检验工作。我们认为,将大语言模型应用于心理文本分类,或许标志着研究范式向一种全新的迭代式方法转变,该方法可提升心理学概念及其操作化定义的效度。
创建时间:
2025-02-21



