concretejungles/T-Wix-instag
收藏Hugging Face2026-04-28 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/concretejungles/T-Wix-instag
下载链接
链接失效反馈官方服务:
资源简介:
T-Wix 是一个俄语监督微调(SFT)数据集,旨在提升模型在算法、数学问题、对话、逻辑思维和推理模式等方面的核心能力。数据集分为两部分:通用部分(468,614 个样本)涵盖数学、科学、编程、常识、指令遵循、角色扮演等多个主题;推理部分(30,984 个样本)专注于高级数学和科学问题,包含详细的推理过程。数据集还包括长上下文样本(如摘要和长问答对)以及英语语料。总样本量约为 499,598 个俄语样本。数据准备过程包括多阶段过滤和质量控制,以确保多样性和高质量。数据集使用 ODC-BY-1.0 许可证,适用于研究和开发用途。
T-Wix is a Russian supervised fine-tuning (SFT) dataset designed to enhance model capabilities in algorithmic and mathematical problem-solving, dialogue, logical thinking, and reasoning. The dataset is divided into two parts: General (468,614 samples) covering topics like Math, Science, Coding, General Knowledge, Instruction Following, and Roleplay; and Reasoning (30,984 samples) focusing on advanced math and science problems with detailed reasoning traces. It also includes long-context samples (e.g., summarization and long-form QA pairs) and an English corpus. The total dataset size is approximately 499,598 Russian samples. Data preparation involves multi-stage filtering and quality control to ensure diversity and high quality. Licensed under ODC-BY-1.0, the dataset is intended for research and development purposes.
提供机构:
concretejungles



