First results of online compression of HEP data using Baler
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8325727
下载链接
链接失效反馈官方服务:
资源简介:
Within the framework of the HiDA Trilateral Data Science Exchange Program, this internship project unveils preliminary findings on online compression using Baler. It involved the examination of various datasets of different sizes from the High Energy Physics (HEP) domain to evaluate compression performance. All datasets used are subsets of the jet data recorded by the CMS experiment at the LHC in 2012, released as open data under the Creative Commons CC0 waiver (see references). The data is modified (flattened, truncated, formatted, etc.) and packaged in a way that makes it easy for others to reproduce the results. The provided files in this page include comprehensive instructions for replicating the project's results, as well as datasets and outcomes. Below, you'll find a brief overview of the project's folder structure, categorized by the dataset utilized (small dataset/example CMS data/larger CMS data), the online/offline compression method, and resource utilization, particularly regarding GPU usage.
Presentations summarizing this project's results can be found here: https://zenodo.org/record/8326707.
Project's folders:
Reproduction Instructions: This folder houses all files that offer detailed guidelines for replicating the project's presented results. These files serve as a reference for accessing relevant materials.
GPU with Example CMS Data.zip: This directory contains all files related to offline compression of the approximately 100MB example CMS dataset provided by Baler. GPU resources were employed in the model training process.
GPU with Larger CMS Data (1).zip: In this section, you'll find files associated with the compression of a larger CMS dataset, approximately 1.4GB in size. It includes results of offline compression and a split of the dataset into a 50/50 ratio for training and testing, with results provided for various epochs.
GPU with Larger CMS Data (2): This folder holds the larger dataset, divided into two halves, with the first half's array values in one file and the second half's in another.
Offline/Online on Small Dataset: Here, you'll find files related to both offline and online compression of a small dataset, roughly 100KB in size, extracted from the example CMS dataset provided by Baler.
Modifications of Small Dataset: This section comprises variations of the small dataset, including both normalized and un-normalized datasets.
Materials: This folder includes fundamental papers and summaries to enhance your understanding of the project.
HiDA: Within this directory, you'll find a printed webpage from the HiDA program.
创建时间:
2024-07-11



