prnshv/teleembed-bench-clean
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/prnshv/teleembed-bench-clean
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
pretty_name: TeleEmbed Benchmark (Clean)
tags:
- retrieval
- telecommunications
- benchmarking
license: apache-2.0
size_categories:
- 1K<n<10K
---
# TeleEmbed Benchmark — **Clean** track
**Standalone dataset:** this repository is everything you need for the **clean** QA splits. Under **`TeleEmbed-Clean/`** you get both **`benchmark_*.json`** and the **passage corpora** (`chunks/<512|1024|2048>/chunks.json`) for O-RAN, 3GPP, and srsRAN. You do **not** need the Main dataset to run evaluation.
**Companion dataset:** the **Main** track (different benchmark JSON, same underlying passages) is published separately; link it here when the URL is set, e.g. `https://huggingface.co/datasets/<your_org>/<your_main_dataset>`.
---
## What you must specify: the embedding model
Use [Sentence Transformers](https://www.sbert.net/) via **`--model`** (Hub id or local path). The reference script encodes corpus + queries with the same encoder, L2-normalizes, and computes MRR / Recall@K. Always record which `--model` you used.
---
## Layout (this repo)
```
TeleEmbed-Clean/
oran/chunks/<512|1024|2048>/chunks.json
oran/benchmark_*.json
3gpp/chunks/...
3gpp/benchmark_*.json
srsran/chunks/...
srsran/benchmark_*.json
scripts/
evaluate_retrieval.py
paths.py
requirements.txt
.gitattributes
```
Clone root = the folder that contains `TeleEmbed-Clean/` and `scripts/`. Run eval with **`--track clean`**.
---
## Quick start (scoring)
```bash
python -m venv .venv && source .venv/bin/activate
pip install -U pip && pip install -r requirements.txt
cd scripts
python evaluate_retrieval.py --corpus oran --track clean --chunk-size 512 \
--model intfloat/e5-base-v2
```
---
## Hugging Face download
```bash
git clone https://huggingface.co/datasets/<YOUR_USER>/<THIS_REPO>
cd <THIS_REPO>
```
---
## Citation
Cite this dataset URL/DOI and the Main benchmark dataset if both are used.
提供机构:
prnshv



