five

manan-u/updesh_updated

收藏
Hugging Face2025-10-12 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/manan-u/updesh_updated
下载链接
链接失效反馈
官方服务:
资源简介:
Updesh是一个大型合成数据集,旨在推进印度语语言的大型语言模型的后训练。它结合了翻译的推理数据和合成的开放域生成内容,以支持基于文化的多语言LLM适应。尽管指令微调LLM取得了快速进展,但大多数现有数据集都专注于英语,导致印度语的高质量、基于文化的资源匮乏。Updesh旨在通过提供丰富、基于印度语言和语境的多语言指令微调数据来填补这一空白。

Updesh is a large-scale synthetic dataset designed to advance post-training of LLMs for Indic languages. It integrates translated reasoning data and synthesized open-domain generative content to support culturally-grounded multilingual adaptation of LLMs. Despite the rapid progress in instruction-tuned LLMs, most existing datasets focus on English, creating a gap in high-quality, culturally grounded resources for Indic languages—resources that are essential for enabling Small Language Models (SLMs) to serve India’s diverse linguistic landscape. Updesh aims to fill this gap by providing rich, multilingual instruction-tuning data grounded in Indian languages and contexts.
提供机构:
manan-u
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作