Dovud-Asadov/uzbek-embedding-dataset

Name: Dovud-Asadov/uzbek-embedding-dataset
Creator: Dovud-Asadov
Published: 2026-03-27 05:10:17
License: 暂无描述

Hugging Face2026-03-27 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Dovud-Asadov/uzbek-embedding-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - uz tags: - embedding - sentence-similarity - information-retrieval - uzbek - e5 license: apache-2.0 task_categories: - sentence-similarity - feature-extraction size_categories: - 10K<n<100K --- # Uzbek Embedding Dataset 40,338 Uzbek query-passage triplets for training text embedding / retrieval models. Generated from Uzbek news articles (kun.uz and other sources). ## Dataset Structure | Split | Rows | |:------|:-----| | train | 38,321 | | test | 2,017 | ### Fields | Field | Description | |:------|:------------| | `query` | Uzbek question/search query | | `positive` | Relevant passage that answers the query | | `negative_1` | Hard negative passage (topically similar but not relevant) | | `negative_2` | Hard negative passage (may be empty) | | `negative_3` | Hard negative passage (may be empty) | | `source_url` | Source article URL | ## Usage ```python from datasets import load_dataset ds = load_dataset("Dovud-Asadov/uzbek-embedding-dataset") print(ds["train"][0]) ``` ## Intended Use Training and evaluating Uzbek text embedding models for semantic search and information retrieval. Used to fine-tune [Dovud-Asadov/e5-uz-v3](https://huggingface.co/Dovud-Asadov/e5-uz-v3).

提供机构：

Dovud-Asadov

5,000+

优质数据集

54 个

任务类型

进入经典数据集