five

makneeeee/spacev1b

收藏
Hugging Face2026-02-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/makneeeee/spacev1b
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other task_categories: - feature-extraction tags: - vector-search - diskann - nearest-neighbor - benchmark pretty_name: SpaceV1B - Sharded DiskANN Indices size_categories: - 1B<n<10B --- # SpaceV1B - Sharded DiskANN Indices Pre-built DiskANN indices for the SpaceV1B dataset, sharded for distributed vector search. ## Dataset Info - **Source**: [Microsoft SPTAG / SpaceV1B](https://github.com/microsoft/SPTAG/tree/main/datasets/SPACEV1B) - **Vectors**: 1,000,000,000 (1 billion) - **Dimensions**: 100 - **Data type**: int8 - **Queries**: 29,316 - **Distance**: L2 ## DiskANN Parameters - **R** (graph degree): 64 - **L** (build beam width): 100 - **PQ bytes**: 32 ## Shard Configurations - **shard_2**: 2 shards x 500,000,000 vectors - **shard_3**: 3 shards x ~333,333,333 vectors - **shard_5**: 5 shards x 200,000,000 vectors - **shard_7**: 7 shards x ~142,857,142 vectors - **shard_10**: 10 shards x 100,000,000 vectors ## File Structure ``` fbin/ base.i8bin # Base vectors (1B x 100 int8) queries.i8bin # Query vectors (29K x 100 int8) diskann/ gt_100.bin # Ground truth (100-NN) shard_N/ # N-shard configuration spacev1b_64_100_32.shardX_disk.index # DiskANN disk index spacev1b_64_100_32.shardX_disk.index_512_none.indices # MinIO graph indices spacev1b_64_100_32.shardX_disk.index_base_none.vectors # MinIO vector data spacev1b_base.shardX.fbin # Shard base data ``` ### Chunked Files Files larger than 49 GB are split into chunks for upload: - `*_512_none.indices.part00`, `.part01`, etc. - `*_base_none.vectors.part00`, etc. (if applicable) - `fbin/base.i8bin.part00`, etc. To reassemble: `cat file.part00 file.part01 ... > file` ## Usage ### Download with huggingface_hub ```python from huggingface_hub import hf_hub_download # Download a specific shard file index = hf_hub_download( repo_id="maknee/spacev1b", filename="diskann/shard_10/spacev1b_64_100_32.shard0_disk.index", repo_type="dataset" ) ``` ### Download with git-lfs ```bash git lfs install git clone https://huggingface.co/datasets/maknee/spacev1b ``` ## License Same as source dataset (Microsoft SPTAG).
提供机构:
makneeeee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作