five

prnshv/teleembed-bench-clean

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/prnshv/teleembed-bench-clean
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en pretty_name: TeleEmbed Benchmark (Clean) tags: - retrieval - telecommunications - benchmarking license: apache-2.0 size_categories: - 1K<n<10K --- # TeleEmbed Benchmark — **Clean** track **Standalone dataset:** this repository is everything you need for the **clean** QA splits. Under **`TeleEmbed-Clean/`** you get both **`benchmark_*.json`** and the **passage corpora** (`chunks/<512|1024|2048>/chunks.json`) for O-RAN, 3GPP, and srsRAN. You do **not** need the Main dataset to run evaluation. **Companion dataset:** the **Main** track (different benchmark JSON, same underlying passages) is published separately; link it here when the URL is set, e.g. `https://huggingface.co/datasets/<your_org>/<your_main_dataset>`. --- ## What you must specify: the embedding model Use [Sentence Transformers](https://www.sbert.net/) via **`--model`** (Hub id or local path). The reference script encodes corpus + queries with the same encoder, L2-normalizes, and computes MRR / Recall@K. Always record which `--model` you used. --- ## Layout (this repo) ``` TeleEmbed-Clean/ oran/chunks/<512|1024|2048>/chunks.json oran/benchmark_*.json 3gpp/chunks/... 3gpp/benchmark_*.json srsran/chunks/... srsran/benchmark_*.json scripts/ evaluate_retrieval.py paths.py requirements.txt .gitattributes ``` Clone root = the folder that contains `TeleEmbed-Clean/` and `scripts/`. Run eval with **`--track clean`**. --- ## Quick start (scoring) ```bash python -m venv .venv && source .venv/bin/activate pip install -U pip && pip install -r requirements.txt cd scripts python evaluate_retrieval.py --corpus oran --track clean --chunk-size 512 \ --model intfloat/e5-base-v2 ``` --- ## Hugging Face download ```bash git clone https://huggingface.co/datasets/<YOUR_USER>/<THIS_REPO> cd <THIS_REPO> ``` --- ## Citation Cite this dataset URL/DOI and the Main benchmark dataset if both are used.
提供机构:
prnshv
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作