five

Supporting data for "GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100810
下载链接
链接失效反馈
官方服务:
资源简介:
The amount of data generated in large clinical and phenotyping studies that use single-cell cytometry is constantly growing. Recent technological advances allow us to easily generate data with hundreds of millions of single-cell data points with more than 40 parameters, originating from thousands of individual samples. The analysis of that amount of high-dimensional data becomes demanding in both hardware and software of high-performance computational resources. Current software tools often do not scale to the datasets of such size; users are thus forced to downsample the data to bearable sizes, in turn losing accuracy and ability to detect many underlying complex phenomena.<br>We present GigaSOM.jl, a fast and scalable implementation of clustering and dimensionality-reduction for flow and mass cytometry data. The implementation of GigaSOM.jl in the high-level and high-performance programming language Julia makes it accessible to the scientific community and allows for efficient handling and processing of datasets with billions of data points using distributed computing infrastructures. We describe the design of GigaSOM.jl, measure its performance and horizontal scaling capability, and showcase the functionality on a large dataset from a recent study.<br>GigaSOM.jl facilitates utilization of the commonly available high-performance computing resources to process the largest available datasets within minutes while producing results of the same quality as the current state-of-art software. Measurements indicate that the performance scales to much larger datasets. The example used on the data from a massive mouse phenotyping effort confirms the applicability of GigaSOM.jl to huge-scale studies.
提供机构:
GigaScience Database
创建时间:
2020-10-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作