Jagged arrays in ROOT TTree, Parquet, and Avro
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14538339
下载链接
链接失效反馈官方服务:
资源简介:
This is a synthetic dataset of random numbers in variable-length, nested data structures in three file formats: ROOT TTree, Parquet, and Avro. There are four levels of depth:
jagged0: not nested; just a flat array of numbers
jagged1: an array of lists of numbers
jagged2: an array of lists of lists of numbers
jagged3: an array of lists of lists of lists of numbers
The TBasket sizes of the TTree files and the row group sizes of the Parquet files were made to be identical, so that performances can be meaningfully compared. All of the files are compressed with ZLIB level 9.
This dataset was first used in a performance study at CHEP 2019:
presentation page
published proceedings
But it has since been used in other studies, such as this one at CHEP 2021:
presentation page
published proceedings
and this one at ACAT 2022:
presentation page
preprint (will be published)
It has become a standard performance benchmark.
The scripts that were used to create this synthetic dataset are in this repository directory, PR #19.
Just one file, zlib9-jagged0.avro, had to be excluded to fit in this Zenodo record, but it is the easiest one to reconstruct from the others.
创建时间:
2024-12-21



