BadarHossain/Bangla-TextBook
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BadarHossain/Bangla-TextBook
下载链接
链接失效反馈官方服务:
资源简介:
TigerLLM是一个孟加拉语大型语言模型家族,旨在解决孟加拉语在自然语言处理中的资源不足问题。该模型基于两个高质量数据集:Bangla-TextBook语料库(包含来自163本教科书的约990万令牌)和Bangla-Instruct数据集(包含10万条原生孟加拉语指令-响应对)。TigerLLM通过持续预训练和模型蒸馏技术,在多个孟加拉语特定基准测试中表现优异,超越了现有的开源和专有模型。
TigerLLM is a family of Bangla Large Language Models designed to address the resource gap in Bangla natural language processing. The models are built using two high-quality datasets: the Bangla-TextBook corpus (approximately 9.9 million tokens from 163 textbooks) and the Bangla-Instruct dataset (100,000 native Bangla instruction-response pairs). Through continual pretraining and model distillation, TigerLLM achieves superior performance on multiple Bangla-specific benchmarks, outperforming existing open-source and proprietary models.
提供机构:
BadarHossain



