five

brianmcgee/nix-cache-dataset

收藏
Hugging Face2026-02-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/brianmcgee/nix-cache-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - nix - nixos size_categories: - 100M<n<1B - 1B<n<10B pretty_name: Nix Cache Dataset configs: - config_name: inventory data_files: datasets/inventory-2026-01-06T01-13Z.parquet - config_name: narinfos in the inventory data_files: datasets/narinfos-2026-01-06T01-13Z.parquet - config_name: narinfos in the inventory - nixos images only data_files: datasets/narinfos-nixos-images-2026-01-06T01-13Z.parquet - config_name: store paths affected by the removal of nixos images data_files: datasets/narinfos-nixos-images-dangling-refs-2026-01-06T01-13Z.parquet - config_name: buildstepoutputs data_files: datasets/buildstepoutputs-2025-12-05-17:38:30Z.csv.zst - config_name: narinfos in inventory but not in buildstepoutputs data_files: datasets/inventory-not-in-buildstepoutputs-2025-12-05-17:38:30Z.parquet --- # Nix Cache Dataset This repository contains several datasets relating to the contents of https://cache.nixos.org: ## Getting started To make it easier to explore the datasets, a [Nix] devshell is provided in `shell.nix`. To enter it, run ```console ❯ nix develop -f shell.nix [brian@saturn:~/Development/com/github/numtide/nix-cache-dataset]$ ``` If you are a [Direnv] user, you can also run `direnv allow` to automatically load the devshell: ```console ❯ direnv allow direnv: loading ~/Development/com/github/numtide/nix-cache-dataset/.envrc direnv: using nix direnv: nix-direnv: Using cached dev shell direnv: export +AR +AS +CC +CONFIG_SHELL +CXX +HOST_PATH +IN_NIX_SHELL +LD +NIX_BINTOOLS +NIX_BINTOOLS_WRAPPER_TARGET_HOST_x86_64_unknown_linux_gnu +NIX_BUILD_CORES +NIX_CC +NIX_CC_WRAPPER_TARGET_HOST_x86_64_unknown_linux_gnu +NIX_CFLAGS_COMPILE +NIX_ENFORCE_NO_NATIVE +NIX_HARDENING_ENABLE +NIX_LDFLAGS +NIX_STORE +NM +OBJCOPY +OBJDUMP +RANLIB +READELF +SIZE +SOURCE_DATE_EPOCH +STRINGS +STRIP +__structuredAttrs +buildInputs +buildPhase +builder +cmakeFlags +configureFlags +depsBuildBuild +depsBuildBuildPropagated +depsBuildTarget +depsBuildTargetPropagated +depsHostHost +depsHostHostPropagated +depsTargetTarget +depsTargetTargetPropagated +doCheck +doInstallCheck +dontAddDisableDepTrack +mesonFlags +name +nativeBuildInputs +out +outputs +patches +phases +preferLocalBuild +propagatedBuildInputs +propagatedNativeBuildInputs +shell +shellHook +stdenv +strictDeps +system ~PATH ~XDG_DATA_DIRS brian@saturn ~/..../nix-cache-dataset  main* ⇡ ❯ ``` The devshell contains a `duckdb` command which is pre-configured to load some helpful utility functions which reside in `./duckdb/init.sql`: * `nixbase32` which converts a fixed-size `byte[20]` column into nixbase32 * `narURL` which recovers the URL of a NAR file from it's `file_hash` and `compression` columns. Both are necessary when introspecting any of the narinfo related datasets as their schema was optimized to reduce file sizes. ## Contents > [!NOTE] > The primary datasets below were compiled with [Narwal]. ### `datasets/inventory-2026-01-06T01-13Z.parquet` This is a complete inventory of the [S3 Bucket] which backs the cache as of `2026-01-06T01:13Z` and provided by the [S3 Inventory Service]. It has the following schema and is ordered by `key`: ```console ❯ duckdb -c "describe select * from 'datasets/inventory-2026-01-06T01-13Z.parquet'" ┌────────────────────┬──────────────────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├────────────────────┼──────────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ key │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ size │ BIGINT │ YES │ NULL │ NULL │ NULL │ │ last_modified_date │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │ │ e_tag │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ storage_class │ VARCHAR │ YES │ NULL │ NULL │ NULL │ └────────────────────┴──────────────────────────┴─────────┴─────────┴─────────┴─────────┘ ``` Here is the first 10 entries: ```console ❯ duckdb -c "select * from 'datasets/inventory-2026-01-06T01-13Z.parquet' limit 10" ┌──────────────────────────────────────────┬───────┬──────────────────────────┬──────────────────────────────────┬───────────────┐ │ key │ size │ last_modified_date │ e_tag │ storage_class │ │ varchar │ int64 │ timestamp with time zone │ varchar │ varchar │ ├──────────────────────────────────────────┼───────┼──────────────────────────┼──────────────────────────────────┼───────────────┤ │ .well-known/pki-validation/gsdv.txt │ 101 │ 2018-09-07 09:35:54+01 │ dcdfef6f1cdfeae2d2c9de7dab84292a │ STANDARD │ │ 000003nzgismzlipaq0jnchpz65d65z7.ls │ 1228 │ 2024-06-21 17:17:36+01 │ bf47b9ab2ea7a795d15107d4dcf64697 │ STANDARD │ │ 000003nzgismzlipaq0jnchpz65d65z7.narinfo │ 1452 │ 2024-06-21 17:17:37+01 │ 34997acded08172170eb1a4cde9456d1 │ STANDARD │ │ 00000c8rdrjqkdxjpm5wrhl6sspapbmn.ls │ 97 │ 2019-11-06 19:35:06+00 │ ae368bafcfb9e5632acc57acf4f2a325 │ STANDARD │ │ 00000c8rdrjqkdxjpm5wrhl6sspapbmn.narinfo │ 905 │ 2019-11-06 19:35:06+00 │ 0bf1c5c76cf15dd07c8c79eca27ebff4 │ STANDARD │ │ 00000in9ndrbpshcibwfgsygx0xq069g.ls │ 329 │ 2025-11-07 18:40:26+00 │ f817f1b476fe0cb82ac2f5dbc19be620 │ STANDARD │ │ 00000in9ndrbpshcibwfgsygx0xq069g.narinfo │ 732 │ 2025-11-07 18:40:28+00 │ 4037fbf8ab5484847756f0839c2c7245 │ STANDARD │ │ 00000jipsrr2nfgn36w04xg2z6hnr7y0.ls │ 323 │ 2024-01-06 07:48:26+00 │ 8a3bad8cb3795080ccd636ef212c1878 │ STANDARD │ │ 00000jipsrr2nfgn36w04xg2z6hnr7y0.narinfo │ 523 │ 2024-01-06 07:48:27+00 │ 997c5c9a6911bfadd7efcb9f845a4082 │ STANDARD │ │ 00000lkv8yz6bsccrp03ppq71mzjdcpv.ls │ 396 │ 2022-05-10 03:33:54+01 │ 3284e9e9e5ae0a2562640e1f20af395d │ STANDARD │ ├──────────────────────────────────────────┴───────┴──────────────────────────┴──────────────────────────────────┴───────────────┤ │ 10 rows 5 columns │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ### `datasets/narinfos-2026-01-06T01-13Z.parquet`` This is a complete download of all `.narinfo` files within the S3 bucket as of `2026-01-06T01:13Z`. It has the following schema: ```console ❯ duckdb -c "describe select * from 'datasets/narinfos-2026-01-06T01-13Z.parquet' ┌───────────────────────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├───────────────────────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤ │ hash │ BLOB │ YES │ NULL │ NULL │ NULL │ │ pname │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ compression │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ file_hash │ BLOB │ YES │ NULL │ NULL │ NULL │ │ file_size │ UBIGINT │ YES │ NULL │ NULL │ NULL │ │ nar_hash │ BLOB │ YES │ NULL │ NULL │ NULL │ │ nar_size │ UBIGINT │ YES │ NULL │ NULL │ NULL │ │ references │ BLOB[] │ YES │ NULL │ NULL │ NULL │ │ deriver │ BLOB │ YES │ NULL │ NULL │ NULL │ │ deriver_pname │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ system │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ signature_domains │ VARCHAR[] │ YES │ NULL │ NULL │ NULL │ │ signature_values │ BLOB[] │ YES │ NULL │ NULL │ NULL │ │ ca_algo │ VARCHAR │ YES │ NULL │ NULL │ NULL │ │ ca_hash │ BLOB │ YES │ NULL │ NULL │ NULL │ │ quirk_references_out_of_order │ BOOLEAN │ YES │ NULL │ NULL │ NULL │ ├───────────────────────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┤ │ 16 rows 6 columns │ └─────────────────────────────────────────────────────────────────────────────────────┘ ``` The schema is the same as the contents of a `.narinfo` file with some notable differences: * `StorePath` has been split into `hash` and `pname`, with the `/nix/store/` prefix omitted. * In addition, `hash` has been decoded from its original 32-byte `nixbase32` format into a 20-byte fixed-size `byte[20]` column to save space. As mentioned in [Getting Started](#Getting-started), there is a `nixbase32` utility function for `duckdb` which can be used to convert it back into it's original format. * `URL` is not included. Instead, it can be recovered from the `file_hash` and `compression` columns using the `narURL` utility function. Again, this was to save space. * `quirk_references_out_of_order` indicates whether the `references` column was not canonically sorted as is required by the narinfo format. Here is an example of using both: ```console ❯ duckdb -c " select nixbase32(hash) as hash, pname, narURL(file_hash, compression) as URL from 'datasets/narinfos-2026-01-06T01-13Z.parquet' where pname like 'nixos-%.iso' limit 10" ┌──────────────────────────────────┬──────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────┐ │ hash │ pname │ URL │ │ varchar │ varchar │ varchar │ ├──────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────┤ │ 8s2i888nxl44dvrqkszyhm9wagx21cv8 │ nixos-21.11pre-git-aarch64-linux.iso │ nar/01i2cax5rgfirdnqy3gk0fam1jxjqzydlm6fmydqcr2rppzfgpyc.nar.xz │ │ 8s2nl0q5l0qjwj4mdh47b32abd7dkb37 │ nixos-plasma5-23.11pre525972.ace5093e36ab-x86_64-linux.iso │ nar/03wiqhgkgbwr6lz6jlhnq9d4swjahdw32ylixshmpy25wmnhkg83.nar.xz │ │ 8s308z464is6rbnn1wglws37bz5cxhi3 │ nixos-minimal-new-kernel-21.03pre253346.87e9b49fc78-x86_64-linux.iso │ nar/1ppngdicfkydg44hbyc6s77f40qk15a8r370a02yg06y4w7fscp3.nar.xz │ │ 8s37vnmdg68z51s81lp7pfclkp0ljl7v │ nixos-plasma5-21.11.335288.3ddd960a3b5-x86_64-linux.iso │ nar/04pic41qkjib69bl1yk48w5bdnb0jj4jfzljmfxsx3c9sr966m71.nar.xz │ │ 8s4yq0kgvw7d34jzd6j256n8rhqdncag │ nixos-18.03pre-git-x86_64-linux.iso │ nar/11m49i89vpm8mjq2khpiy9p4br3fkvk57nmi6narmxbih9lahg45.nar.xz │ │ 8s4zyq65nl0d5jjd28pvm8rih78khhlx │ nixos-21.03pre-git-x86_64-linux.iso │ nar/00ivaz2lwdvqcg7c64l9ikgk0njcphad4h69z2hvih7xrrzvwbb0.nar.xz │ │ 8s5577gfk2makszs3bnpfzf8h477ywwd │ nixos-minimal-17.03.1730.dad3c2231c-i686-linux.iso │ nar/074d9323apli87919s8rr8rfa0javdqvk24r4siclq1ysg67gp38.nar.xz │ │ 8s580im104kqdf873sc1bqanp5x7mvjn │ nixos-18.09pre-git-x86_64-linux.iso │ nar/0nw2zmxw3bhn9ixx66n9vfhk81qvr88yh1a3dxyv1qi5xbqv8fgz.nar.xz │ │ 8s5wj1h8hcvhiqxghlgblxnsi8lccrs7 │ nixos-graphical-25.11pre819493.4206c4cb5675-x86_64-linux.iso │ nar/0mnwc0lpxf4hllv97gc58rd7m0rkph03cg2b35zmarq1kcd1ihq5.nar.xz │ │ 8s616arfxnpgz8grd3rggm9ahgk5wvn7 │ nixos-minimal-19.09pre174421.b306b4cf1a9-x86_64-linux.iso │ nar/1qxcksrfa9nv10ln5kf2mcx2jndx4hyf34x594779f0lh4rlfw68.nar.xz │ ├──────────────────────────────────┴──────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────┤ │ 10 rows 3 columns │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ### `datasets/narinfos-nixos-2026-01-06T01-13Z.parquet` This is a subset of the `narinfos-2026-01-06T01-13Z.parquet` dataset which is attempting to identify any narinfos related to NixOS ISOs and other images which have been uploaded to the cache. These are candidates for removal from the cache as they are stored elsewhere. It was compiled using `./duckdb/nixos-images.sql`; ### `datasets/narinfos-nixos-images-dangling-refs-2026-01-06T01-13Z.parquet` This is a list of store paths which reference one or more of the store paths contained in `datasets/narinfos-nixos-images-2026-01-06T01-13Z.parquet`, which will be affected by their removal from https://cache.nixos.org. It was compiled using `bin/dangling-references.py`. ### `datasets/buildstepoutputs-2025-12-05-17:38:30Z.csv.zst` This is a dump of the `buildstepoutputs` table taken from [Hydra] on `2025-12-05T17:38:30Z`. It can be thought of as a record of every output path that [Hydra] has ever built and looks like this: ```console ❯ head datasets/buildstepoutputs-2025-12-05-17:38:30Z.csv build,stepnr,name,path 684,2,out,/nix/store/mnkbj3aziazk2ijadn23cpwarm6bfmh8-nixos-install 683,5,out,/nix/store/172yppi4bklbs1cyh91wqwpyyfbkvgds-system-path 684,3,out,/nix/store/dxi8ij2la8in64mclyjslgs04nylcsa2-upstart-manual 685,1,out,/nix/store/isaw8s240isvg06pimbr1wj12wxs9glv-hydra-build 684,4,out,/nix/store/qcpb0179capgcyij03xk6fqhlwbafjr0-upstart-jobs 686,1,out,/nix/store/07syvqz4w8n0z3alfm4kg5i46v3wbidx-checkinstall-1.6.1 686,2,out,/nix/store/1g5nks931nirya16hmw2l1lg0jwazmsw-e2fslibs_1.39+1.40-WIP-2006.11.14+dfsg-2etch1_amd64.deb 686,3,out,/nix/store/0dgvq0q33sniq77kq9150shj0jqllzh0-file_4.17-5etch3_amd64.deb 686,4,out,/nix/store/7f03xxka0l0h623ydvqsjvqf6w1vvhzv-heimdal-1.0.2.tar.gz ``` ### `datasets/inventory-not-in-buildstepoutputs-2025-12-05-17:38:30Z.parquet` This file contains a list of all the `.narinfo` files within `datasets/inventory-2026-01-06T01-13Z.parquet` (up to `2025-12-05-17:38:30Z`) which do not have a corresponding entry in `datasets/buildstepoutputs-2025-12-05-17:38:30Z.csv.zst`. It was compiled using `./duckdb/inventory-not-in-buildstepoutputs.sql`; Subsequently, graphs were generated using `bin/graph-inventory-not-in-buildstepoutputs.py` and placed in `./graphs`: ![Inventory not in Buildstepoutputs Daily](./graphs/inventory-not-in-buildstepoutputs-daily.png) ![Inventory not in Buildstepoutputs Monthly](./graphs/inventory-not-in-buildstepoutputs-monthly.png) ![Inventory not in Buildstepoutputs Daily](./graphs/inventory-not-in-buildstepoutputs-yearly-comparison.png) [Nix]: https://nixos.org/ [Direnv]: https://direnv.net/ [S3 Bucket]: https://aws.amazon.com/s3/ [S3 Inventory Service]: https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html [Narwal]: https://github.com/numtide/narwal [Hydra]: https://hydra.nixos.org/
提供机构:
brianmcgee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作