High dimensional surrogacy: computational aspects of an upscaled analysis

Mendeley Data2024-06-25 更新2024-06-29 收录

下载链接：

https://tandf.figshare.com/articles/High_dimensional_surrogacy_computational_aspects_of_an_upscaled_analysis/9746051/1

下载链接

链接失效反馈

官方服务：

资源简介：

Identification of genomic biomarkers is an important area of research in the context of drug discovery experiments. These experiments typically consist of several high dimensional datasets that contain information about a set of drugs (compounds) under development. This type of data structure introduces the challenge of multi-source data integration. High-Performance Computing (HPC) has become an important tool for everyday research tasks. In the context of drug discovery, high dimensional multi-source data needs to be analyzed to identify the biological pathways related to the new set of drugs under development. In order to process all information contained in the datasets, HPC techniques are required. Even though R packages for parallel computing are available, they are not optimized for a specific setting and data structure. In this article, we propose a new framework, for data analysis, to use R in a computer cluster. The proposed data analysis workflow is applied to a multi-source high dimensional drug discovery dataset and compared with a few existing R packages for parallel computing.

基因组生物标志物（genomic biomarkers）的识别，是药物发现实验相关研究的重要领域。此类实验通常包含多组高维度数据集，收录了一批在研药物（compounds）的相关信息。这类数据结构带来了多源数据整合的技术挑战。高性能计算（High-Performance Computing, HPC）已成为日常科研工作的重要工具。在药物发现场景中，需对高维多源数据开展分析，以识别与该批次在研新药相关的生物学通路。为处理数据集中包含的全部信息，需借助高性能计算技术。尽管当前已有支持并行计算的R语言包，但此类工具并未针对特定的计算环境与数据结构进行优化。本文提出了一种全新的数据分析框架，用于在计算机集群中部署R语言开展数据分析。本文将所提出的数据分析工作流应用于某一多源高维度药物发现数据集，并与多款现有的并行计算R语言包进行了性能对比。

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集