odyn-network/odyn-benchmarks
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/odyn-network/odyn-benchmarks
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- benchmark
- inference
- vllm
- llm
- throughput
- latency
- odyn
size_categories:
- 1K<n<10K
---
# Odyn Benchmarks
Inference benchmark datasets and results for the [Odyn Network](https://github.com/Odyn-Network/phase2) — a distributed, OpenAI-compatible AI inference platform built on vLLM, Ray Serve, and FastAPI.
## Dataset Structure
### Prompt Profiles (`data/`)
Four load profiles covering the full input/output token distribution space, sourced from real Odyn traffic and augmented with [ShareGPT Vicuna Unfiltered](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered):
| Profile | Description | Input tokens | Output tokens | Rows |
|---------|-------------|-------------|---------------|------|
| **A** | Short input, Long output | avg 102 (1–498) | avg 457 (256–1941) | 500 |
| **B** | Long input, Short output | avg 1124 (512–19113) | avg 130 (1–255) | 500 |
| **C** | Long input, Long output | avg 1057 (512–20438) | avg 563 (256–2223) | 500 |
| **D** | Short input, Short output | avg 96 (1–509) | avg 144 (1–255) | 500 |
Each CSV has the schema:
```
id, profile, input_tokens, output_tokens, input, output
```
The first 250 rows per profile come from original Odyn benchmark traffic; rows 251–500 are sourced from ShareGPT Vicuna Unfiltered, classified by token count using the cl100k_base tokenizer.
### Benchmark Results (`results/`)
Raw latency and throughput measurements from two model deployments:
| Model | Hardware | Concurrency levels |
|-------|----------|--------------------|
| `facebook/opt-125m` | RTX 3090 | 1, 2, 4, 8, 16, 32 |
| `Qwen/Qwen2.5-7B-Instruct` | DGX Spark (Blackwell) | 4, 8, 16, 32, 64, 128, 192, 250 |
Each model directory contains:
- `benchmark_{A,B,C,D}.json` — per-profile results with chat streaming, chat non-streaming, embeddings, and batch metrics
- `chat_benchmarks.csv` — concurrency sweep: TTFT, TPOT, e2e latency (avg/p50/p95/p99), throughput (tok/s, req/s)
- `batch_benchmarks.csv` — async batch job throughput by batch size
- `embeddings_benchmarks.csv` — embeddings throughput by concurrency
## Key Metrics
Each benchmark entry records:
| Metric | Description |
|--------|-------------|
| `ttft_ms` | Time to first token (avg, p50, p95, p99) |
| `tpot_ms` | Time per output token |
| `e2e_ms` | End-to-end latency |
| `throughput_tok_s` | Output tokens per second |
| `throughput_req_s` | Requests per second |
## System Architecture
Odyn Phase 2 is a queue-worker system with three independent pillars:
1. **Real-time chat completions** — streaming + non-streaming via OpenAI-compatible `/v1/chat/completions`
2. **Offline batch inference** — async job queue via `/v1/batch` + `/v1/job/{id}`
3. **Vector embeddings** — high-throughput generation via `/v1/embeddings`
The stack: **vLLM** (inference engine) + **Ray Serve** (orchestration) + **FastAPI** (API gateway), monitored with Prometheus and Grafana.
## Usage
```python
from datasets import load_dataset
# Load a prompt profile
ds = load_dataset("odyn-network/odyn-benchmarks", data_files="data/benchmark_profile_A.csv", split="train")
# Load Qwen benchmark results
import pandas as pd
df = pd.read_csv("hf://datasets/odyn-network/odyn-benchmarks/results/qwen_results/chat_benchmarks.csv")
```
## License
Apache 2.0. ShareGPT-sourced rows (251–500 per profile) are also under Apache 2.0 per the upstream dataset license.
提供机构:
odyn-network



