Replication Data for: Embedding Regression: Models for Context-Specific Description and Inference
收藏DataONE2023-01-19 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:0778bc1406b47f3397456c6951e0e78db2a79feaef0826ab69f20bdc3cd2fc7b
下载链接
链接失效反馈官方服务:
资源简介:
Replication Data for: \"Embedding Regression: Models for Context-Specific Description and Inference\". All necessary data and estimated models are available in the following Dropbox folder: https://www.dropbox.com/sh/7al371qtr9102qq/AADKhjhYgnFCxOOQaugQloTBa?dl=0 Keep in mind the folder is quite large at 12.59 GB. Paper Abstract: Social scientists commonly seek to make statements about how word use varies over circumstances—including time, partisan identity, or some other document-level covari- ate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term “immigration.” Building on the success of pretrained language models, we introduce the `a la Carte on Text (conText) embed- ding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used—and thus what words “mean”—in dif- ferent contexts. We show that it outperforms slower, more complicated alternatives, and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We demonstrate that it can be used for a broad range of important tasks, including understanding US polarization, historical legislative development, and sentiment detection. We provide open-source software for fitting the model.
创建时间:
2023-11-08



