Metadata and Modeling Outputs for study "The Quiet Transformations of Literary Studies". Data
收藏DataCite Commons2026-01-07 更新2024-07-13 收录
下载链接:
https://rucore.libraries.rutgers.edu/rutgers-lib/44747
下载链接
链接失效反馈官方服务:
资源简介:
New analytical approaches, like topic modeling, can illuminate subtle transformations, revealing concepts, frequently taken for granted, to be more variable than scholars have assumed. In this study, the corpus that was modeled included 21,367 JSTOR articles and 13,221 distinct author names resulting in the 150-topic model. The four files supporting this study and available here are: 1) vocab.txt: UTF-8 text, one word per line, giving all 98835 word types included in the model. The list of stop words excluded from this vocabulary is given at https://www.ideals.illinois.edu/handle/2142/45709, 2) id_map.txt: UTF-8 text, one string per line, giving JSTOR ID strings of all 23167 documents included in the model, in the order indexed by the sampling state file, 3) mallet_state.gz (370MB): gzip'd UTF-8 text representing the final sampling state output by MALLET. Each token of the input documents is represented by a single line, with six white-space delimited fields: document index, document label (unused), token index, word type index, word type as a string, topic index. The word type index is zero-based and corresponds to the order in vocab.txt. The document index is zero-based and corresponds to the order in id_map.txt, and 4) metadata.tar.gz (3.9MB): gzip'd tar archive of 8 CSV files containing metadata for the documents modeled. Metadata for documents in the model can be located by matching the "id" column to the IDs given in id_map.txt.
提供机构:
No Publisher Supplied
创建时间:
2014-09-22



