BadarHossain/Bangla-TextBook

Name: BadarHossain/Bangla-TextBook
Creator: BadarHossain
Published: 2025-12-12 06:09:08
License: 暂无描述

Hugging Face2025-12-12 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/BadarHossain/Bangla-TextBook

下载链接

链接失效反馈

官方服务：

资源简介：

TigerLLM是一个孟加拉语大型语言模型家族，旨在解决孟加拉语在自然语言处理中的资源不足问题。该模型基于两个高质量数据集：Bangla-TextBook语料库（包含来自163本教科书的约990万令牌）和Bangla-Instruct数据集（包含10万条原生孟加拉语指令-响应对）。TigerLLM通过持续预训练和模型蒸馏技术，在多个孟加拉语特定基准测试中表现优异，超越了现有的开源和专有模型。

TigerLLM is a family of Bangla Large Language Models designed to address the resource gap in Bangla natural language processing. The models are built using two high-quality datasets: the Bangla-TextBook corpus (approximately 9.9 million tokens from 163 textbooks) and the Bangla-Instruct dataset (100,000 native Bangla instruction-response pairs). Through continual pretraining and model distillation, TigerLLM achieves superior performance on multiple Bangla-specific benchmarks, outperforming existing open-source and proprietary models.

提供机构：

BadarHossain

5,000+

优质数据集

54 个

任务类型

进入经典数据集