five

Data for "The myth of the ‘unreadable’ humanities and social sciences"

收藏
Figshare2026-03-26 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Data_for_The_myth_of_the_unreadable_humanities_and_social_sciences_/31860601
下载链接
链接失效反馈
官方服务:
资源简介:
Data supporting "The myth of the ‘unreadable’ humanities and social sciences" (https://communities.springernature.com/posts/the-myth-of-the-unreadable-humanities-and-social-sciences). The CSV file has a header row, 485 rows with data and 9 columns. The CSV contains aggregated counts or scores of high-level linguistic complexity indicators, calculated from Dimensions abstracts between 1800 and 2024, grouped into decades and 22 high-level topics (see https://www.researchsquare.com/article/rs-6529718/latest) for details about the topic classification. The scores for individual abstracts have been aggregated into time-topic segments by taking the arithmetic mean of the scores of the individual values for the segment. The attached Readability by year and FoR L1 topic.sql file shows the code used for processing the data.The columns are:decade: a numeric value for the decade the document was published.for_l1: a string value indicating one of 22 high-level fields or topics.n_documents: a numeric value for the number of documents falling into the time-topic segment.mean_ari: a numeric value for the mean Automated Readability Index for the segment (https://en.wikipedia.org/wiki/Automated_readability_index)mean_char: a numeric value for the mean number of characters for the segment.mean_words: a numeric value for the mean number of words for the segment, operationalised as white-space delimited tokens.mean_sentences: a numeric value for the mean number of sentences for the segment, operationalised as the number of sentence-final punctuation marks (.?!).mean_ttr: a numeric value for the mean type-token ration for the segment (https://www.sketchengine.eu/glossary/type-token-ratio-ttr/).mean_ld: a numeric value for the mean lexical density for the segment, calculated by treating non-stopwords as lexical items, based on the NLTK English stopword list.Competing interestsThe data set was created, and shared, as part of regular work duties at Springer Nature. Springer Nature is part-owned by the Holtzbrinck Group which also owns Digital Science, the provider of the Dimensions database used for this research and an owner of Figshare. This historical affiliation should not be taken to suggest any present or continuing connection with Springer Nature, or the Holtzbrinck Group, on the part of the author(s) beyond any conclusion of their employment.
创建时间:
2026-03-26
二维码
社区交流群
二维码
科研交流群
商业服务