lapa-llm/open-thoughts-114K-uk

Name: lapa-llm/open-thoughts-114K-uk
Creator: lapa-llm
Published: 2025-11-02 11:49:59
License: 暂无描述

Hugging Face2025-11-02 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/lapa-llm/open-thoughts-114K-uk

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: system dtype: string - name: conversations list: - name: from dtype: string - name: value dtype: string - name: original list: - name: from dtype: string - name: value dtype: string splits: - name: train num_bytes: 6605487157 num_examples: 113941 download_size: 2605537935 dataset_size: 6605487157 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-sa-4.0 task_categories: - text-generation - question-answering language: - uk tags: - math - science - code - puzzles - lapa - synthetic pretty_name: Ukrainian OpenThoughts 114K --- # Dataset Card for Ukrainian OpenThoughts 114K ## Dataset Description **Dataset Summary** The translated version of [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) to Ukrainian using [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it).  **Languages** - Ukrainian (uk)   **Data Fields** - `system`: original system prompt - `conversation`: list of messages in a dialog (array of objects) - `from`: normalized sender role — `user` or `assistant` (system messages are removed) - `value`: message text - `original`: original conversations from [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) ## Dataset Creation **Source Data** - Base dataset: [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) (Apache-2.0). - Translation: inference with [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it). ## Considerations for Using the Data **Intended Uses** - Instruction/chat LLM training in Ukrainian - Research on reasoning-heavy tasks **Social Impact** This dataset was created to support Ukrainian language AI development and improve language technology accessibility for Ukrainian speakers.  ## Citation TBD  ## Contact  For questions or feedback, please open an issue on the dataset repository. ## License CC-BY-SA-4.0 --- *This dataset is part of the "Lapa" - Ukrainian LLM initiative to advance natural language processing for the Ukrainian language.*

提供机构：

lapa-llm

5,000+

优质数据集

54 个

任务类型

进入经典数据集