maknee/bioasq_bier_2048_1m
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maknee/bioasq_bier_2048_1m
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- feature-extraction
- text-retrieval
language:
- en
configs:
- config_name: default
data_files:
- split: train
path: "parquet/base.parquet"
---
# BioASQ-BEIR Vector Database Dataset (2048d, 1M)
Generated embeddings dataset for vector database training and evaluation.
## Dataset Summary
This dataset contains 1,000,000 text samples with vector embeddings (2048 dimensions)
generated from the BioASQ-BEIR dataset using Qwen/Qwen3-Embedding-8B.
## Dataset Structure
- **Base dataset**: 1,000,000 samples with embeddings
- **Embedding dimension**: 2048
## Repository Structure
### parquet/
- `base.parquet` - Main dataset with text and embeddings
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("maknee/bioasq_bier_2048_1m")
base_data = dataset["train"]
import numpy as np
embeddings = np.array(base_data["embedding"])
texts = base_data["text"]
```
## Dataset Information
- **Source**: BioASQ-BEIR
- **Size**: 1,000,000 samples
- **Dimension**: 2048
- **Format**: Parquet
## Citation
```bibtex
@dataset{huggingface_embeddings_maknee_bioasq_bier_2048_1m,
title={BioASQ-BEIR Vector Database Embeddings Dataset},
author={Henry Zhu},
year={2026},
url={https://huggingface.co/datasets/maknee/bioasq_bier_2048_1m}
}
```
## License
MIT License.
提供机构:
maknee



