five

odyn-network/odyn-benchmarks

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/odyn-network/odyn-benchmarks
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - benchmark - inference - vllm - llm - throughput - latency - odyn size_categories: - 1K<n<10K --- # Odyn Benchmarks Inference benchmark datasets and results for the [Odyn Network](https://github.com/Odyn-Network/phase2) — a distributed, OpenAI-compatible AI inference platform built on vLLM, Ray Serve, and FastAPI. ## Dataset Structure ### Prompt Profiles (`data/`) Four load profiles covering the full input/output token distribution space, sourced from real Odyn traffic and augmented with [ShareGPT Vicuna Unfiltered](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered): | Profile | Description | Input tokens | Output tokens | Rows | |---------|-------------|-------------|---------------|------| | **A** | Short input, Long output | avg 102 (1–498) | avg 457 (256–1941) | 500 | | **B** | Long input, Short output | avg 1124 (512–19113) | avg 130 (1–255) | 500 | | **C** | Long input, Long output | avg 1057 (512–20438) | avg 563 (256–2223) | 500 | | **D** | Short input, Short output | avg 96 (1–509) | avg 144 (1–255) | 500 | Each CSV has the schema: ``` id, profile, input_tokens, output_tokens, input, output ``` The first 250 rows per profile come from original Odyn benchmark traffic; rows 251–500 are sourced from ShareGPT Vicuna Unfiltered, classified by token count using the cl100k_base tokenizer. ### Benchmark Results (`results/`) Raw latency and throughput measurements from two model deployments: | Model | Hardware | Concurrency levels | |-------|----------|--------------------| | `facebook/opt-125m` | RTX 3090 | 1, 2, 4, 8, 16, 32 | | `Qwen/Qwen2.5-7B-Instruct` | DGX Spark (Blackwell) | 4, 8, 16, 32, 64, 128, 192, 250 | Each model directory contains: - `benchmark_{A,B,C,D}.json` — per-profile results with chat streaming, chat non-streaming, embeddings, and batch metrics - `chat_benchmarks.csv` — concurrency sweep: TTFT, TPOT, e2e latency (avg/p50/p95/p99), throughput (tok/s, req/s) - `batch_benchmarks.csv` — async batch job throughput by batch size - `embeddings_benchmarks.csv` — embeddings throughput by concurrency ## Key Metrics Each benchmark entry records: | Metric | Description | |--------|-------------| | `ttft_ms` | Time to first token (avg, p50, p95, p99) | | `tpot_ms` | Time per output token | | `e2e_ms` | End-to-end latency | | `throughput_tok_s` | Output tokens per second | | `throughput_req_s` | Requests per second | ## System Architecture Odyn Phase 2 is a queue-worker system with three independent pillars: 1. **Real-time chat completions** — streaming + non-streaming via OpenAI-compatible `/v1/chat/completions` 2. **Offline batch inference** — async job queue via `/v1/batch` + `/v1/job/{id}` 3. **Vector embeddings** — high-throughput generation via `/v1/embeddings` The stack: **vLLM** (inference engine) + **Ray Serve** (orchestration) + **FastAPI** (API gateway), monitored with Prometheus and Grafana. ## Usage ```python from datasets import load_dataset # Load a prompt profile ds = load_dataset("odyn-network/odyn-benchmarks", data_files="data/benchmark_profile_A.csv", split="train") # Load Qwen benchmark results import pandas as pd df = pd.read_csv("hf://datasets/odyn-network/odyn-benchmarks/results/qwen_results/chat_benchmarks.csv") ``` ## License Apache 2.0. ShareGPT-sourced rows (251–500 per profile) are also under Apache 2.0 per the upstream dataset license.
提供机构:
odyn-network
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作