comparison_not_humanities Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5068698
下载链接
链接失效反馈官方服务:
资源简介:
The comparison_not_humanities dataset contains word-frequency and other non-consumptive-use data about 1,380,456 unique English-language news documents (no duplicate or close-variant documents) that do not contain the word "humanities." The documents came from mainstream U.S. news sources published during 2000-2019.
WE1S gathered this data using keyword searches of 3 of the most common words in the English language (based on a well-known analysis of the Oxford English Corpus) that LexisNexis indexes and thus makes available for search: "person," "say," and "good". We took data from the top 15 circulating newspapers in the U.S. from 2000-2019, randomly selecting 1 month per year for each keyword in order to limit results to more manageable numbers (each year searched therefore includes data from 3 months of that year). We also took data from every other LexisNexis source from which we had gathered data for our humanities_keywords dataset. (We were not able fully to replicate previous searches, however, so some sources do not have comparison data.) For this purpose, we focused on the years 2013-2019 and randomly selected 1 month per year for each keyword in order to limit results. To exclude articles containing the word "humanities" from the results, we searched within each of our selected sources for articles containing "person AND NOT humanities," "say AND NOT humanities," and "good AND NOT humanities." This search included the plural forms of each of these words, so documents in this dataset may contain the words "persons," "people," "says," and "goods."
(See WE1S Research Materials Overview for the relation between the project's "datasets" and "collections.")
创建时间:
2021-07-05



