maknee/bioasq_bier_4096_100k
收藏Hugging Face2026-02-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maknee/bioasq_bier_4096_100k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- feature-extraction
- text-retrieval
language:
- en
configs:
- config_name: default
data_files:
- split: train
path: "parquet/base.parquet"
---
# BioASQ-BEIR Vector Database Dataset (4096d, 100k)
Generated embeddings dataset for vector database training and evaluation.
## Dataset Summary
This dataset contains 100,000 text samples with vector embeddings (4096 dimensions)
generated from the BioASQ-BEIR dataset using Qwen/Qwen3-Embedding-8B.
## Dataset Structure
- **Base dataset**: 100,000 samples with embeddings
- **Embedding dimension**: 4096
## Repository Structure
### parquet/
- `base.parquet` - Main dataset with text and embeddings
## Usage
### Loading with HuggingFace Datasets
```python
from datasets import load_dataset
dataset = load_dataset("maknee/bioasq_bier_4096_100k")
base_data = dataset['train']
import numpy as np
embeddings = np.array(base_data['embedding'])
texts = base_data['text']
```
## Dataset Information
- **Source**: BioASQ-BEIR
- **Size**: 100,000 samples
- **Dimension**: 4096
- **Format**: Parquet
## Citation
```bibtex
@dataset{huggingface_embeddings_maknee_bioasq_bier_4096_100k,
title={BioASQ-BEIR Vector Database Embeddings Dataset},
author={Henry Zhu},
year={2026},
url={https://huggingface.co/datasets/maknee/bioasq_bier_4096_100k}
}
```
## License
MIT License.
提供机构:
maknee



