five

maknee/bioasq_bier_2048_1m

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maknee/bioasq_bier_2048_1m
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - feature-extraction - text-retrieval language: - en configs: - config_name: default data_files: - split: train path: "parquet/base.parquet" --- # BioASQ-BEIR Vector Database Dataset (2048d, 1M) Generated embeddings dataset for vector database training and evaluation. ## Dataset Summary This dataset contains 1,000,000 text samples with vector embeddings (2048 dimensions) generated from the BioASQ-BEIR dataset using Qwen/Qwen3-Embedding-8B. ## Dataset Structure - **Base dataset**: 1,000,000 samples with embeddings - **Embedding dimension**: 2048 ## Repository Structure ### parquet/ - `base.parquet` - Main dataset with text and embeddings ## Usage ```python from datasets import load_dataset dataset = load_dataset("maknee/bioasq_bier_2048_1m") base_data = dataset["train"] import numpy as np embeddings = np.array(base_data["embedding"]) texts = base_data["text"] ``` ## Dataset Information - **Source**: BioASQ-BEIR - **Size**: 1,000,000 samples - **Dimension**: 2048 - **Format**: Parquet ## Citation ```bibtex @dataset{huggingface_embeddings_maknee_bioasq_bier_2048_1m, title={BioASQ-BEIR Vector Database Embeddings Dataset}, author={Henry Zhu}, year={2026}, url={https://huggingface.co/datasets/maknee/bioasq_bier_2048_1m} } ``` ## License MIT License.
提供机构:
maknee
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作