Data from "Predicting future grant amounts using topic-level features"
收藏Figshare2025-12-01 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_from_Predicting_future_grant_amounts_using_topic-level_features_/30740195/1
下载链接
链接失效反馈官方服务:
资源简介:
Data from the research paper "Predicting future grant amounts using topic-level features", presented at the 29th STI-ENID conference in Bristol, UK, September 4, 2025.The file contains 1106 rows (excluding the header) and 21 columns, in a tab-separated format (.tsv). The columns are listed below. Note:Each row corresponds to one (level 3) topicAll observations are aggregated at this levelThe level 1 topics were used as a grouping variable in the analysisLag: this is a lagged panel dataset. The variable <i>total_lagged_eur</i> was calculated for the time-period 2021 to 2023 (inclusive), while the other variables were calculated for the years 2015 to 2020 (inclusive).<br>topic_label (str): a level 3 (ANZSRC FoR) category for_l1 (str): a level 1 (ANZSRC FoR) categorysize (int): number of publications in the level 3 topictif (float): topic impact factorn_policy_doc_citations (int): number of citations from policy documents to papers in the topicn_clinical_trial_citations (int): number of clinical trials linked to the topicn_patent_citations (int): number of citations from patents to papers in the topictotal_lagged_eur (float): the fractional, lagged grant amount in euros for the topic log_total_lagged_eur (float): natural logarithm of <i>total_lagged_eur</i>log_size (float): natural logarithm of <i>size</i>log_policy (float): natural logarithm of <i>n_policy_doc_citations</i>log_trials (float): natural logarithm of <i>n_clinical_trial_citations</i>log_patents (float): natural logarithm of <i>n_patent_citations</i>n_l3_siblings (int): the number of other level 3 topics that belong to the same level 2 parent topictotal_authors (int): total author contributions, calculated by summing the total number of authors per paper in the topicun_authors (int): the total number of unique authors publishing in the topicauthor_ttr (float): <i>un_authors</i> divided by <i>total_authors</i>, in analogy with the type-token ratio in linguisticspreds (float): predicted log grant amounts, using a linear mixed-effects regression modelpred_obs_diff (float): difference between observed grant sum and the predicted sumsd2diff (bool): boolean indicating whether <i>pred_obs_diff</i> is greater (true) than 2 standard deviations from the mean or not (false)ou_funded (str): a string indicating if the topic is over-funded (labelled "Over") or under-funded (labelled "Under") when comparing the predicted funding (<i>preds</i>) with the actual funding amount. The majority of topics have predicted values very close to the observed funding values (labelled "Expected")For details about the topic data, see: https://www.researchsquare.com/article/rs-6529718/v1. The other datapoints are calculated based on the <i>Dimensions</i> database.<br>
提供机构:
Jenset, Gard
创建时间:
2025-12-01



