five

typhoon-t1-3b-research-preview-data

收藏
魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/scb10x/typhoon-t1-3b-research-preview-data
下载链接
链接失效反馈
官方服务:
资源简介:
# Typhoon T1 3B Research Preview Data ## Overview This is a dataset used to train our first open reasoning model, **Typhoon T1 (Research Preview)**: [llama-3.2-typhoon-t1-3b-research-preview](https://huggingface.co/scb10x/llama-3.2-typhoon-t1-3b-research-preview). It's available in Alpaca format (`{instruction, input, output}`), although `input` for all records is null. We acknowledge the owners of the original data sources. Please visit our [technical blog](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662) for more details on the original data sources. ## Data Splits This dataset consists of 55,677 records for SFT training with the following distribution: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/615313b0793ef66b3324da1f/xi6q1nydpQnzKNUGo2ITx.png) ## Attributes - `instruction`: an instruction - `input`: all inputs are null in this dataset, but included for compatibility with trainers - `output`: long thought generated using the approach described in our technical blog ## Citation ``` @misc{taveekitworachai2025typhoont1openthai, title={Typhoon T1: An Open Thai Reasoning Model}, author={Pittawat Taveekitworachai and Potsawee Manakul and Kasima Tharnpipitchai and Kunat Pipatanakul}, year={2025}, eprint={2502.09042}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.09042}, } ```

# 台风T1 3B研究预览数据集(Typhoon T1 3B Research Preview Data) ## 概述 本数据集用于训练我们首款开源推理模型**Typhoon T1(研究预览版)**:[llama-3.2-typhoon-t1-3b-research-preview](https://huggingface.co/scb10x/llama-3.2-typhoon-t1-3b-research-preview)。该数据集采用阿尔帕卡(Alpaca)格式组织,结构为`{instruction, input, output}`,不过所有样本的`input`字段均为null。我们谨向原始数据源的所有者致谢。如需了解原始数据源的更多细节,请访问我们的[技术博客](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662)。 ## 数据划分 本数据集包含55677条用于监督微调(SFT)训练的样本,其分布情况如下: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/615313b0793ef66b3324da1f/xi6q1nydpQnzKNUGo2ITx.png) ## 字段说明 - `instruction`:指令文本 - `input`:本数据集所有样本的该字段均为null,但保留此字段以兼容各类训练框架 - `output`:通过本团队技术博客中阐述的方法生成的长文本思考结果 ## 引用格式 @misc{taveekitworachai2025typhoont1openthai, title={Typhoon T1:一款开源泰国语推理模型}, author={Pittawat Taveekitworachai and Potsawee Manakul and Kasima Tharnpipitchai and Kunat Pipatanakul}, year={2025}, eprint={2502.09042}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.09042}, }
提供机构:
maas
创建时间:
2025-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作