Data from "Predicting future grant amounts using topic-level features"
收藏Figshare2025-12-01 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Data_from_Predicting_future_grant_amounts_using_topic-level_features_/30740195
下载链接
链接失效反馈官方服务:
资源简介:
Data from the research paper "Predicting future grant amounts using topic-level features", presented at the 29th STI-ENID conference in Bristol, UK, September 4, 2025.The file contains 1106 rows (excluding the header) and 21 columns, in a tab-separated format (.tsv). The columns are listed below. Note:Each row corresponds to one (level 3) topicAll observations are aggregated at this levelThe level 1 topics were used as a grouping variable in the analysisLag: this is a lagged panel dataset. The variable total_lagged_eur was calculated for the time-period 2021 to 2023 (inclusive), while the other variables were calculated for the years 2015 to 2020 (inclusive).topic_label (str): a level 3 (ANZSRC FoR) category for_l1 (str): a level 1 (ANZSRC FoR) categorysize (int): number of publications in the level 3 topictif (float): topic impact factorn_policy_doc_citations (int): number of citations from policy documents to papers in the topicn_clinical_trial_citations (int): number of clinical trials linked to the topicn_patent_citations (int): number of citations from patents to papers in the topictotal_lagged_eur (float): the fractional, lagged grant amount in euros for the topic log_total_lagged_eur (float): natural logarithm of total_lagged_eurlog_size (float): natural logarithm of sizelog_policy (float): natural logarithm of n_policy_doc_citationslog_trials (float): natural logarithm of n_clinical_trial_citationslog_patents (float): natural logarithm of n_patent_citationsn_l3_siblings (int): the number of other level 3 topics that belong to the same level 2 parent topictotal_authors (int): total author contributions, calculated by summing the total number of authors per paper in the topicun_authors (int): the total number of unique authors publishing in the topicauthor_ttr (float): un_authors divided by total_authors, in analogy with the type-token ratio in linguisticspreds (float): predicted log grant amounts, using a linear mixed-effects regression modelpred_obs_diff (float): difference between observed grant sum and the predicted sumsd2diff (bool): boolean indicating whether pred_obs_diff is greater (true) than 2 standard deviations from the mean or not (false)ou_funded (str): a string indicating if the topic is over-funded (labelled "Over") or under-funded (labelled "Under") when comparing the predicted funding (preds) with the actual funding amount. The majority of topics have predicted values very close to the observed funding values (labelled "Expected")For details about the topic data, see: https://www.researchsquare.com/article/rs-6529718/v1. The other datapoints are calculated based on the Dimensions database.
创建时间:
2025-12-01



