openthoughts_no_think
收藏魔搭社区2025-11-27 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/lapa-llm/openthoughts_no_think
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Ukrainian OpenThoughts 114K
## Dataset Description
**Dataset Summary**
The translated version of [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) to Ukrainian using [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it). We restructured this dataset for instruction tuning by removing reasoning traces.
<!--[Provide a brief overview of your dataset - what it contains, its purpose, and why it was created. Example: "This dataset contains X examples of Ukrainian text collected from Y sources, designed to support the development of Ukrainian language models."] -->
**Languages**
- Ukrainian (uk)
<!-- **Dataset Structure** -->
<!-- The dataset is organized into the following splits:
| Split | Examples |
|-------|----------|
| Train | [number] |
| Validation | [number] |
| Test | [number] | -->
**Data Fields**
- `system`: original system prompt
- `conversation`: list of messages in a dialog (array of objects)
- `from`: normalized sender role — `user` or `assistant` (system messages are removed)
- `value`: message text
- `original`: original conversations from [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)
## Dataset Creation
**Source Data**
- Base dataset: [OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) (Apache-2.0).
- Translation: inference with [google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it).
**Processing / “Fixed” Changes**
* **Removed reasoning traces**: dropped all reasoning traces for `assistant` from `conversations`.
## Considerations for Using the Data
**Intended Uses**
- Instruction/chat LLM training in Ukrainian
- Research on reasoning-heavy tasks (without reasoning traces)
**Social Impact**
This dataset was created to support Ukrainian language AI development and improve language technology accessibility for Ukrainian speakers.
<!-- **Bias and Limitations**
[Discuss any known biases, limitations, or potential issues with the dataset. Be transparent about what the dataset may not be suitable for.] -->
## Citation
TBD
<!--
**BibTeX**
```bibtex
@dataset
{dataset_name,
author = {[Your Name/Organization]},
title = {[Dataset Name]},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/[your-org]/[dataset-name]}
}
```
-->
## Contact
<!-- For questions or feedback, please contact [your contact information] or open an issue on the dataset repository. -->
For questions or feedback, please open an issue on the dataset repository.
## License
CC-BY-SA-4.0
---
*This dataset is part of the "Lapa" - Ukrainian LLM initiative to advance natural language processing for the Ukrainian language.*
# 乌克兰语OpenThoughts 114K数据集卡片
## 数据集描述
**数据集概述**
本数据集是[OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)的乌克兰语翻译版本,翻译工作通过[google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it)模型完成。我们移除了原数据中的推理痕迹,对数据集进行重构以适配指令微调任务。
**语言**
- 乌克兰语(uk)
**数据字段**
- `system`: 原始系统提示词
- `conversation`: 对话消息列表(对象数组)
- `from`: 归一化的发送者角色——`user`(用户)或`assistant`(助手)(已移除系统消息)
- `value`: 消息文本
- `original`: 源自[OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)的原始对话
## 数据集构建
**源数据**
- 基础数据集:[OpenThoughts 114K](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k)(采用Apache-2.0开源协议)。
- 翻译方式:通过[google/gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it)模型执行推理生成翻译结果。
**处理/“固化”修改**
* **移除推理痕迹**:删除了`conversations`字段中助手角色的所有推理痕迹。
## 数据集使用注意事项
**预期用途**
- 乌克兰语大语言模型(Large Language Model, LLM)的指令/对话微调训练
- 面向无推理痕迹场景的重推理任务研究
**社会影响**
本数据集旨在支持乌克兰语人工智能产业发展,提升乌克兰语使用者的语言技术可及性。
## 引用信息
待补充
## 联系方式
如有疑问或反馈,请在数据集仓库提交Issue。
## 许可协议
CC-BY-SA-4.0
---
*本数据集隶属于“Lapa”——乌克兰语大语言模型倡议项目,旨在推动乌克兰语自然语言处理技术进步。*
提供机构:
maas
创建时间:
2025-10-28



