five

MultiSynt/MT-Reasoning

收藏
Hugging Face2026-03-02 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/MultiSynt/MT-Reasoning
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: all default: true data_files: - split: train path: - data/deu_Latn/*.parquet - data/eng_Latn/*.parquet - data/fra_Latn/*.parquet - config_name: deu_Latn data_files: - split: train path: - data/deu_Latn/*.parquet - config_name: eng_Latn data_files: - split: train path: - data/eng_Latn/*.parquet - config_name: fra_Latn data_files: - split: train path: - data/fra_Latn/*.parquet language: - de - en - fr license: apache-2.0 task_categories: - text-generation tags: - reasoning - multilingual size_categories: - 10M<n<100M --- <p align="center"> <img src="multisynt-logo.png" alt="multisynt-logo" width=600> </p> # MultiSynt MultiSynt is an **open multilingual synthetic dataset**. The **MT Reasoning** subset of MultiSynt is made of **automatic translations into 2 languages** of [Glaive AI reasoning dataset](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m) containing 22mil+ general reasoning questions, reasoning traces and responses. |lang|rows|prompt_tokens|reasoning_tokens|response_tokens|total_tokens| |---|---|---|---|---|---| |deu_Latn|17_354_716|1_873_153_732|26_010_932_738|14_862_651_336|42_746_737_806| |fra_Latn|17_354_716|1_802_885_115|25_224_272_259|14_090_573_840|41_117_731_214| |original eng_Latn|17_354_716|1_106_247_763|17_169_550_671|8_619_895_937|26_895_694_371| ### License This synthetic dataset is made available under the **Creative Commons CC0 license ("no rights reserved")**. The full text of the license is available [here](https://creativecommons.org/share-your-work/public-domain/cc0/). This license applies to the dataset as a database (selection and arrangement of records). ### Citation When using this dataset, please **cite this repository and the original [Glaive AI](https://huggingface.co/datasets/glaiveai/reasoning-v1-20m)**. ## Acknowledgements * We acknowledge the EuroHPC Joint Undertaking for supporting this project through access to the EuroHPC supercomputer LEONARDO, hosted by CINECA (Italy) and the LEONARDO consortium, through an EuroHPC AI Factory Large Scale Access call. * This project is supported by the OpenEuroLLM project, co-funded by the Digital Europe Programme under GA no. 101195233. For more information see [openeurollm.eu](openeurollm.eu). * ellamind is supported by the German Federal Ministry for Economic Affairs and Energy (BMWE) under the soofi (Sovereign Open Source Foundation Models for European Intelligence) project. <img src="eu_cofunding.png" alt="EU cofunding logo" width="300" style="vertical-align: middle;">
提供机构:
MultiSynt
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作