prnshv/teleembed-bench-clean

Name: prnshv/teleembed-bench-clean
Creator: prnshv
Published: 2026-04-08 02:25:27
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/prnshv/teleembed-bench-clean

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en pretty_name: TeleEmbed Benchmark (Clean) tags: - retrieval - telecommunications - benchmarking license: apache-2.0 size_categories: - 1K<n<10K --- # TeleEmbed Benchmark — **Clean** track **Standalone dataset:** this repository is everything you need for the **clean** QA splits. Under **`TeleEmbed-Clean/`** you get both **`benchmark_*.json`** and the **passage corpora** (`chunks/<512|1024|2048>/chunks.json`) for O-RAN, 3GPP, and srsRAN. You do **not** need the Main dataset to run evaluation. **Companion dataset:** the **Main** track (different benchmark JSON, same underlying passages) is published separately; link it here when the URL is set, e.g. `https://huggingface.co/datasets/<your_org>/<your_main_dataset>`. --- ## What you must specify: the embedding model Use [Sentence Transformers](https://www.sbert.net/) via **`--model`** (Hub id or local path). The reference script encodes corpus + queries with the same encoder, L2-normalizes, and computes MRR / Recall@K. Always record which `--model` you used. --- ## Layout (this repo) ``` TeleEmbed-Clean/ oran/chunks/<512|1024|2048>/chunks.json oran/benchmark_*.json 3gpp/chunks/... 3gpp/benchmark_*.json srsran/chunks/... srsran/benchmark_*.json scripts/ evaluate_retrieval.py paths.py requirements.txt .gitattributes ``` Clone root = the folder that contains `TeleEmbed-Clean/` and `scripts/`. Run eval with **`--track clean`**. --- ## Quick start (scoring) ```bash python -m venv .venv && source .venv/bin/activate pip install -U pip && pip install -r requirements.txt cd scripts python evaluate_retrieval.py --corpus oran --track clean --chunk-size 512 \ --model intfloat/e5-base-v2 ``` --- ## Hugging Face download ```bash git clone https://huggingface.co/datasets/<YOUR_USER>/<THIS_REPO> cd <THIS_REPO> ``` --- ## Citation Cite this dataset URL/DOI and the Main benchmark dataset if both are used.

提供机构：

prnshv

5,000+

优质数据集

54 个

任务类型

进入经典数据集