Data from: Who shares? Who doesn’t? Factors associated with openly archiving raw research data

DataONE2011-05-26 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn’t, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available. First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available. These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let’s learn from those with high rates of sharing to embrace the full potential of our research output.

诸多科研倡议均鼓励研究人员公开共享原始数据集，以期提升研究效率与研究质量。尽管投入了大量时间与资金，我们仍无法明确知晓哪些研究人员会公开共享原始科研数据、哪些不会，也不清楚哪些举措与高数据共享率存在关联。本分析采用文献计量学方法（bibliometric methods），旨在识别研究人员在论文发表后公开归档原始基因表达微阵列数据集（gene expression microarray datasets）的频率规律。通过自动化方法，我们筛选出2000年至2009年间发表的11603篇涉及基因表达微阵列数据构建的论文，其中仅25%的论文对应数据集被收录于最佳实践数据仓储（best-practice repositories），这一比例从2001年的不足5%逐步提升至2007-2009年的30%-35%。若考虑自动化方法的灵敏度误差，近期的基因表达研究中约有45%公开了其数据集。对124项涵盖数据构建论文的多元文献计量属性进行一阶因子分析（first-order factor analysis），最终提取出15个因子，分别刻画了作者群体、资助情况、科研机构、发表期刊以及研究领域的环境特征。多元回归（multivariate regression）分析结果显示，在以下情形中研究人员更有可能共享数据：其一，过往具备数据共享或复用经验；其二，研究发表于开放获取期刊（open access journal）或数据共享政策较为严格的期刊；其三，研究获得了大量美国国立卫生研究院（National Institutes of Health, NIH）资助。而针对癌症研究以及涉及人类受试者的研究，其作者公开数据集的概率则相对最低。本研究结果表明，当前科研数据共享率仍处于较低水平且增长缓慢，在那些数据共享能够产生最大影响的研究领域，数据可获取性反而最差。我们应借鉴高共享率群体的经验，充分挖掘科研产出的全部潜力。

创建时间：

2011-05-26