Replication Data for: Constructing Vec-tionaries to Extract Message Features from Texts: A Case Study of Moral Content

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://doi.org/10.7910/DVN/YITNSV

下载链接

链接失效反馈

官方服务：

资源简介：

While researchers often study message featureslike moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present an approach to construct vec-tionary measurement toolsthat boost validated dictionaries with word embedding through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionariesimprove the measurement of message features from text, especially those in short format, by expanding the applicability of originalvocabularies to other contexts. Importantly, avec-tionary can produce additional metrics tocapture the valence and ambivalence of amessage feature beyond its strength in texts. Using moral content in tweets as acase study,we illustrate the steps to construct the moral foundations vec-tionary, showcasing itsability to process texts missed by conventional dictionaries and word embedding methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission.

尽管研究者常针对文本中的道德内容等消息特征展开研究（例如政党宣言与社交媒体文本），但对其进行量化仍是一项挑战。传统人工编码在可扩展性与编码者间信度（intercoder reliability）方面存在显著瓶颈。尽管基于词典的方法兼具成本效益与计算效率，但往往缺乏语境敏感性，且受限于为原始应用场景所开发的词汇表。本研究提出一种构建向量词典（vectory）量化工具的方法：通过非线性优化手段，利用词嵌入（word embedding）对已验证词典进行增强。通过利用词嵌入所编码的语义关联，向量词典能够将原始词汇表的适用范围拓展至其他场景，从而提升对文本（尤其是短文本）中消息特征的量化效果。尤为重要的是，向量词典还可生成额外的量化指标，用以捕捉文本中某一消息特征的效价与矛盾性，而非仅衡量其强度。本研究以推文中的道德内容为案例，详细演示了道德基础向量词典（moral foundations vectory）的构建步骤，并展示了其能够处理传统词典与词嵌入方法遗漏的文本，且生成的量化结果更贴合众包人工评估的标准。此外，向量词典生成的额外量化指标还揭示了独特的分析视角，助力预测消息转发等相关结果。

创建时间：

2025-03-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集