Replication Data for: Embedding Regression: Models for Context-Specific Description and Inference

DataONE2023-01-19 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:0778bc1406b47f3397456c6951e0e78db2a79feaef0826ab69f20bdc3cd2fc7b

下载链接

链接失效反馈

官方服务：

资源简介：

Replication Data for: \"Embedding Regression: Models for Context-Specific Description and Inference\". All necessary data and estimated models are available in the following Dropbox folder: https://www.dropbox.com/sh/7al371qtr9102qq/AADKhjhYgnFCxOOQaugQloTBa?dl=0 Keep in mind the folder is quite large at 12.59 GB. Paper Abstract: Social scientists commonly seek to make statements about how word use varies over circumstances—including time, partisan identity, or some other document-level covari- ate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term “immigration.” Building on the success of pretrained language models, we introduce the `a la Carte on Text (conText) embed- ding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used—and thus what words “mean”—in dif- ferent contexts. We show that it outperforms slower, more complicated alternatives, and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We demonstrate that it can be used for a broad range of important tasks, including understanding US polarization, historical legislative development, and sentiment detection. We provide open-source software for fitting the model.

创建时间：

2023-11-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集