five

klei1/bleta-sq-dataset-v1

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/klei1/bleta-sq-dataset-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - sq - en license: apache-2.0 task_categories: - text-generation - question-answering tags: - albanian - alpaca - instruction-tuning - bleta size_categories: - 10K<n<100K --- # Bleta SQ Instruct v1 Cleaned instruction-following dataset for Albanian language fine-tuning, used to train the **Bleta** AI assistant. ## Dataset Details - **Total rows:** 39,873 - **Language:** Albanian (sq) - **Format:** Alpaca (instruction / input / output) ## Composition | Split | Rows | Description | |---|---|---| | Albanian Alpaca | 38,480 | Cleaned from saillab/alpaca-albanian-cleaned (removed ~12K Afrikaans rows) | | Bleta Identity | 1,393 | Grammatically correct Albanian identity Q&A for the Bleta assistant | ## Cleaning - Removed C1 control characters corrupting Albanian ë/ç - Language-filtered: kept only Albanian (sq) rows, removed ~12,300 Afrikaans rows - Deduplicated on (instruction, output) - Bleta identity uses correct feminine grammar throughout ## Usage ```python from datasets import load_dataset ds = load_dataset("klei1/bleta-sq-instruct-v1") ``` ## Source - Base: [saillab/alpaca-albanian-cleaned](https://huggingface.co/datasets/saillab/alpaca-albanian-cleaned)
提供机构:
klei1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作