severo/pq_reproduction
收藏Hugging Face2025-09-04 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/severo/pq_reproduction
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了博客文章《Parquet内容定义分块》中使用的Parquet文件的复制品。数据集中的每个Parquet示例都有8个版本,基于两种压缩方式(无压缩和snappy压缩),是否启用内容定义分块(CDC)功能,以及是否有数据页索引。数据集的结构包括索引、无索引、启用或禁用CDC功能,以及不同的压缩方式下的Parquet文件。
This dataset contains a reproduction of the Parquet files used in the blog post Parquet Content-Defined Chunking. Each Parquet example in the dataset is available in 8 versions based on two compressions (none and snappy), with or without the content-defined chunking (CDC) feature enabled, and with or without data pages index. The structure of the dataset includes indexed, no-indexed, with CDC enabled or disabled, and Parquet files under different compression methods.
提供机构:
severo



