Frequency of positive words in grant applications
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/6676561
下载链接
链接失效反馈官方服务:
资源简介:
The data was gathered to reproduce the methodology and findings presented in Lerchenmueller et al. (2019) in proposal texts that were submitted to different funding schemes offered by the Swiss National Science Foundation.
Data description
The data files (as .xlsx) contains three sheets:
Career funding schemes (excluding fellowships): 1802 proposals included
Spark funding scheme: 612 proposals included
Project funding scheme (Projects): 5736 proposals included
Each Sheet includes a data matrix where each row is a specific grant proposal. The unit of analysis are grant proposals and the texts used are the title and abstracts. The data is used in a project available from github (https://github.com/snsf-data/positive_language). [Paper soon to be submitted]
The first 25 columns give us the 25 positive words and their respective counts in each of the analysed texts of the grant proposals. The positiv words (here the column names are) used are the following: amazing, assuring, astonishing, bright, creative, encouraging, enormous, excellent, favorable/ favourable, groundbreaking, hopeful, innovative, inspiring, inventive, novel, phenomenal, prominent, promising, reassuring, remarkable, robust, spectacular, supportive, unique, and unprecedented. Those were first proposed by Vinkers et al. (2015).
Additionnally, the following columns are present in the data:
sum_pos: The sum of the number of positive words in the texts.
text_length and text_length100: the text length (count of words), and text length divided by 100.
ResponsibleApplicantGender: the gender of the corresponding applicant (m or f)
ResponsibleApplicantAge: the age of the corresponding applicant at submission (continuous)
NationalityIsoCode: the nationality of the corresponding applicant (CH or not CH)
IsApproved and IsFundable: binary variable indicating funding success, or whether the project would have been fundable given the grade with unlimited funding.
Decision and CallYear: year of the call deadline, and year the funding decision was taken.
ResearchInstitutionType: the type of institution the corresponding applicant is affiliated to (Cantonal University, ETH Domain, Other)
which_lang: the language the proposal was written in (all english)
Text processing
The text corpus used in this analysis was also used as the basis for additional analyses. It therefore underwent a thorough cleaning with the help of the R-packages {tm} and {stringr}. After the pre-processing steps, the number of times each of the postitive words (see above) occured in the title and abstracts of the respective proposal is computed using a simple keyword search.
Pre-processing steps:
Punctuation was removed.
All non-standard alphanumeric characters were removed.
All characters were converted to lowercase.
Extra white spaces were removed.
Internet formatting was removed: URLs, email addresses, twitter formatting (words starting with # and @).
Common English contractions were converted to their non-contracted form ("it's" --> "it is").
English language stopwords were removed.
创建时间:
2024-02-26



