five

Dovud-Asadov/uzbek-embedding-dataset

收藏
Hugging Face2026-03-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Dovud-Asadov/uzbek-embedding-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - uz tags: - embedding - sentence-similarity - information-retrieval - uzbek - e5 license: apache-2.0 task_categories: - sentence-similarity - feature-extraction size_categories: - 10K<n<100K --- # Uzbek Embedding Dataset 40,338 Uzbek query-passage triplets for training text embedding / retrieval models. Generated from Uzbek news articles (kun.uz and other sources). ## Dataset Structure | Split | Rows | |:------|:-----| | train | 38,321 | | test | 2,017 | ### Fields | Field | Description | |:------|:------------| | `query` | Uzbek question/search query | | `positive` | Relevant passage that answers the query | | `negative_1` | Hard negative passage (topically similar but not relevant) | | `negative_2` | Hard negative passage (may be empty) | | `negative_3` | Hard negative passage (may be empty) | | `source_url` | Source article URL | ## Usage ```python from datasets import load_dataset ds = load_dataset("Dovud-Asadov/uzbek-embedding-dataset") print(ds["train"][0]) ``` ## Intended Use Training and evaluating Uzbek text embedding models for semantic search and information retrieval. Used to fine-tune [Dovud-Asadov/e5-uz-v3](https://huggingface.co/Dovud-Asadov/e5-uz-v3).
提供机构:
Dovud-Asadov
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作