Replication Data for: Using Cross-Encoders to Measure the Similarity of Short Texts in Political Science
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/1GDQYY
下载链接
链接失效反馈官方服务:
资源简介:
In many settings, scholars wish to estimate the similarity of political texts. However, the most commonly used methods in political science struggle to identify when two texts convey the same meaning as they rely too heavily on identifying words that appear in both documents. This limitation is especially salient when the underlying documents are short, an increasingly prevalent form of textual data in modern political research. Building on recent advances in computer science, I introduce to political science cross-encoders for precise estimates of semantic similarity in short texts. Scholars can use either off-the-shelf versions or build a customized model. I illustrate this approach in three examples applied to social messages generated in a telephone game, news headlines about US Supreme Court decisions, and Facebook posts from members of Congress. I show that cross-encoders, which utilize pair-level embeddings, offer superior performance across tasks relative to word-based and sentence-level embedding approaches.
创建时间:
2025-02-11



