five

AI71ai/Arctic-Wiki-English-5M

收藏
Hugging Face2026-01-29 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/AI71ai/Arctic-Wiki-English-5M
下载链接
链接失效反馈
官方服务:
资源简介:
Arctic-Wiki-English-5M是一个与VDBBench兼容的向量基准测试案例,发布为Hugging Face的数据集仓库。该数据集源自英语维基百科文章,使用Snowflake Arctic Embed L v2.0模型进行嵌入,并通过文本长度过滤以确保质量。包含5,000,000个训练向量和1,000个测试查询向量。数据集专为向量数据库基准测试设计,提供可选的打乱训练数据和最近邻查询的真实结果。README还提供了使用VDBBench和Hugging Face datasets库下载和使用数据集的说明。

Arctic-Wiki-English-5M is a VDBBench-compatible vector benchmark case published as a Hugging Face dataset repository. The dataset is derived from English Wikipedia articles, embedded using the Snowflake Arctic Embed L v2.0 model, and filtered by text length to ensure quality. It contains 5,000,000 training vectors and 1,000 test query vectors. The dataset is designed for benchmarking vector databases and includes optional shuffled training data and ground truth for nearest neighbor queries. The README also provides instructions for downloading and using the dataset with VDBBench and the Hugging Face datasets library.
提供机构:
AI71ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作