Replication data for: A Method of Automated Nonparametric Content Analysis for Social Science

DataONE2016-11-17 更新2024-06-26 收录

下载链接：

https://search.dataone.org/view/sha256:5024557f612459319ab28cf66e78b1e523a4b8e818460e716194bc2f72d56815

下载链接

链接失效反馈

官方服务：

资源简介：

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, new spapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis. This article led to the formation of Crimson Hexagon See also: Software for Automated Content Analysis

创建时间：

2023-11-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集