New data on the publishing productivity of American sociologists
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/3892309
下载链接
链接失效反馈官方服务:
资源简介:
OVERVIEW This data file, compiled from multiple online sources, presents 2013–2017 publication counts—articles, articles in high-impact journals, books, and books from high-impact publishers—for 2,132 professors and associate professors in 426 U.S. departments of sociology. It also includes information on institutional characteristics (e.g., institution type, highest sociology degree offered, department size) and individual characteristics (e.g., academic rank, gender, PhD year, PhD institution). The data may be useful for investigations of scholarly productivity, the correlates of scholarly productivity, and the contributions of particular individuals and institutions. Complete population data are presented for the top 26 doctoral programs, doctoral institutions other than R1 universities, the top liberal arts colleges, and other bachelor's institutions. Sample data are presented for Carnegie R1 universities (other than the top 26) and master's institutions. USER NOTES Please see our paper in Scholarly Assessment Reports, freely available at https://doi.org/10.29024/sar.36 , for full information about the data set and the methods used in its compilation. The section numbers used here refer to the Appendix of that paper. See the References, below, for other papers that have made use of these data. The data file is a single Excel file with five worksheets: Sampling, Articles, Books, Individuals, and Departments. Each worksheet has a simple rectangular format, and the cells include just text and values—no formulas or links. A few general notes apply to all five worksheets. • The yellow column headings represent institutional (departmental) data. The blue column headings represent data for individual faculty. • iType is institution type, as described in section A.2—TopR (top research universities), R1 (other R1 universities), OD (other doctoral universities), M (master's institutions), TopLA (top liberal arts colleges), or B (other bachelor's institutions). nType provides the same information, but as a single-digit code that is more useful for sorting the rows; 1=TopR, 2=R1, 3=OD, 4=M, 5=TopLA, and 6=B. • Inst is a four-digit institution code. The first digit corresponds to nType, and the last three digits allow for alphabetical sorting by institution name. Indiv is a one- or two-digit code that can be used to sort the individuals by name within each department. The Inst, nType, and Indiv codes are consistent across the five worksheets. • For binary variables such as Full professor and Female, 1 indicates yes (full professor or female) and 0 indicates no (associate professor or male). The five worksheets represent five distinct stages in the data compilation process. First, the Sampling worksheet lists the 1,530 base-population institutions (see section A.3) and presents the characteristics of the faculty included in the data file. Each row with an entry in the Individual column represents a faculty member at one of the 426 institutions included in the data set. Each row without an entry in the Individual column represents an institution that either (a) did not meet the criteria for inclusion (section A.1) or (b) was not needed to attain the desired sample size for the R1 or M groups (section A.3). The Articles worksheet includes the data compiled from SocINDEX, as described in section A.6. Each row with an entry in the Journal column represents an article written by one of the 2,132 faculty included in the data. Each row without an entry in the Journal column represents a faculty member without any article listings in SocINDEX for the 2013–2017 period. (Note that SocINDEX items other than peer-reviewed articles—editorials, letters, etc.—may be listed in the Journal column but assigned a value of 1 in the Excluded column and a value of 0 in the Article credit and HI article credit columns. We assigned no credit for items such as editorial and letters, but other researchers may wish to include them.) The N and i columns represent, for each article, the number of authors (N) and the faculty member's place in the byline (i), as described in section A.8. The CiteScore and Highest percentile columns were used to identify high-impact journals, as indicated in the HI journal column. The Article credit and HI article credit columns are article counts, adjusted for co-authorship. The Books worksheet includes data compiled from Amazon and other sources, as described in section A.7. Each row with an entry in the Book column represents a book written by one of the 2,132 faculty. Each row without an entry in the Book column represents a faculty member without any book listings in Amazon during the 2013–2017 period. The publication counts in the Books worksheet—Book credit and HI book credit—follow the same format as those in the Articles worksheet. The Individuals worksheet consolidates information from the Articles and Books worksheets so that each of the 2,132 individuals is represented by a single row. The worksheet also includes several categorical variables calculated or otherwise derived from the raw data—Years since PhD, for instance, and the three corresponding binary variables. We suspect that many data users will be most interested in the Individuals worksheet. The Departments worksheet collapses the individual data so that each of the 426 institutions (departments) is represented by a single row. Individual characteristics such as Female and Years since PhD are presented as percentages or averages—% Female and Avg years since PhD, for instance. Each of the four productivity measures is represented by a departmental total, an average (the total divided by the number of full and associate professors), a departmental standard deviation, and a departmental median.
数据集概览
本数据集整合自多个在线数据源,涵盖2013至2017年的学术产出统计数据,包括论文、高影响力期刊(high-impact journals)论文、专著以及高影响力出版社出版的专著数量,涉及美国426个社会学系的2132名正教授与副教授。此外,数据集还包含两类特征信息:院校层面特征(如院校类型、开设的最高社会学学位层级、系所规模)与个体层面特征(如学术职级、性别、博士(PhD)毕业年份、博士(PhD)毕业院校)。本数据集可用于学术产出能力、学术产出相关影响因素,以及特定个体与院校学术贡献的相关研究。
数据集完整覆盖了前26名博士项目院校、非R1级别的博士授予院校、顶尖文理学院以及其他本科授予院校的全体样本;卡内基(Carnegie)R1级大学(前26名除外)与硕士授予院校仅提供抽样数据。
使用说明
完整的数据集说明与编译方法,请参阅发表于《学术评估报告(Scholarly Assessment Reports)》的论文,可通过https://doi.org/10.29024/sar.36免费获取。本文中提及的章节编号对应该论文的附录部分。如需了解其他使用本数据集的研究,请参阅下文的参考文献列表。
本数据集为单个Excel文件,包含五个工作表:抽样表(Sampling)、论文表(Articles)、专著表(Books)、个体数据表(Individuals)以及系所数据表(Departments)。所有工作表均采用标准矩形格式,单元格仅包含文本与数值,无公式或超链接。
以下通用说明适用于全部五个工作表:
1. 黄色列标题代表院校(系所)层面的数据,蓝色列标题代表个体教师层面的数据。
2. iType为院校类型,具体定义参见附录A.2章节:TopR(顶尖研究型大学)、R1(其他R1级研究型大学)、OD(其他博士授予院校)、M(硕士授予院校)、TopLA(顶尖文理学院)以及B(其他本科授予院校)。nType提供相同的类型信息,但以一位数字编码形式呈现,便于行排序:1=TopR,2=R1,3=OD,4=M,5=TopLA,6=B。
3. Inst为四位院校编码,首位数字对应nType的类型编码,后三位用于按院校名称进行字母排序。Indiv为一位或两位编码,用于在每个系所内按姓名对教师进行排序。Inst、nType与Indiv编码在五个工作表中保持一致。
4. 对于“正教授”与“女性”这类二元变量,1代表“是”(正教授或女性),0代表“否”(副教授或男性)。
五个工作表对应数据编译流程的五个不同阶段:
1. 抽样表(Sampling):列出1530个基础抽样院校(详见附录A.3章节),并展示本数据文件中纳入的教师特征。个体列存在条目的行,代表本数据集纳入的426所院校中的某一位教师;个体列无条目的行,则代表两类院校:(a) 未达到纳入标准的院校(附录A.1章节),或(b) 无需纳入即可满足R1或M组所需样本量的院校。
2. 论文表(Articles):包含从SocINDEX编译得到的数据,详见附录A.6章节。期刊列存在条目的行,代表本数据集纳入的2132位教师中某位作者发表的论文;期刊列无条目的行,则代表该教师在2013-2017年间未在SocINDEX中检索到任何论文记录。(注:SocINDEX中除同行评议论文外的其他条目,如社论、书信等,可能会出现在期刊列中,但在排除列(Excluded)中赋值为1,在论文学分(Article credit)列与高影响力论文学分(HI article credit)列中赋值为0。本研究未为社论、书信这类条目赋予学术产出学分,但其他研究者可考虑将其纳入统计。)N与i列分别代表每篇论文的作者总数(N)以及该教师在作者署名顺序中的位次(i),详见附录A.8章节。CiteScore与最高百分位(Highest percentile)列用于识别高影响力期刊(high-impact journals),对应高影响力期刊列(HI journal)。论文学分与高影响力论文学分为经合著者数量调整后的论文产出计数。
3. 专著表(Books):包含从亚马逊(Amazon)及其他数据源编译得到的数据,详见附录A.7章节。专著列存在条目的行,代表2132位教师中某位作者出版的专著;专著列无条目的行,则代表该教师在2013-2017年间未在亚马逊平台检索到任何专著记录。专著表中的产出计数(专著学分与高影响力专著学分)与论文表的格式保持一致。
4. 个体数据表(Individuals):整合了论文表与专著表中的信息,因此2132位个体各对应一行。该工作表还包含若干从原始数据计算或衍生得到的分类变量,例如“博士毕业至今年限(Years since PhD)”以及三个对应的二元变量。我们推测多数数据使用者会更关注个体数据表。
5. 系所数据表(Departments):整合了个体层面的数据,因此426所院校(系所)各对应一行。性别、博士毕业至今年限这类个体特征以百分比或平均值形式呈现,例如“女性教师占比(% Female)”与“平均博士毕业年限(Avg years since PhD)”。四项学术产出指标分别对应系所总产出、平均产出(总产出除以正教授与副教授总数)、系所产出标准差以及系所产出中位数。
创建时间:
2023-06-28



