Replication Data for: Towards a Framework for Creating Trustworthy Measures with Supervised Machine Learning for Text
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/AFBW80
下载链接
链接失效反馈官方服务:
资源简介:
Supervised learning is increasingly used in social science research to quantify abstract concepts in textual data. However, a review of recent studies reveals inconsistencies in reporting practices and validation standards. To address this issue, we propose a framework that systematically outlines the process of transforming text into a quantitative measure, emphasizing key reporting decisions at each stage. Clear and comprehensive validation is crucial, enabling readers to critically evaluate both the methodology and the resulting measure. To illustrate our framework, we develop and validate a measure assessing the tone of questions posed to nominees during U.S. Senate confirmation hearings. This study contributes to the growing literature advocating for transparency and rigor in applying machine learning methods within computational social sciences.
监督学习(Supervised learning)正日益被应用于社会科学研究,以量化文本数据中的抽象概念。然而,对近期相关研究的梳理显示,其报告规范与验证标准存在诸多不一致之处。为解决这一问题,本文提出一套系统性框架,完整梳理了将文本转化为量化指标的流程,并着重强调各阶段需明确的报告决策要点。清晰且全面的验证环节至关重要,可帮助读者批判性地评估研究方法与最终生成的量化指标。为阐释所提框架,本文构建并验证了一项量化指标,用于评估美国参议院(U.S. Senate)提名确认听证会上向被提名人提出的问题的语气倾向。本研究可为计算社会科学(Computational Social Science)领域中,倡导机器学习应用透明化与严谨性的不断发展的学术文献贡献新的研究成果。
创建时间:
2025-07-22



