five

TableInstruct

收藏
魔搭社区2025-12-26 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/TableInstruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- # TableLlama: Towards Open Large Generalist Models for Tables Project Page: [https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) Paper: [https://arxiv.org/abs/2311.09206](https://arxiv.org/abs/2311.09206) Model: [https://huggingface.co/osunlp/TableLlama/](https://huggingface.co/osunlp/TableLlama/) Code: [https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) ## Introduction We introduce TableLlama, an open-source large generalist model specifically tailored for various table-based tasks. The TableLlama model is trained on TableInstruct Dataset, a meticulously curated instruction tuning dataset for tables. TableLlama is tuned on 2.6 million table-based task data, and can handle up to 8K context! ## Model 🤗 [TableLlama-7B](https://huggingface.co/osunlp/TableLlama/) ## Data The models are trained on the 🤗 [TableInstruct Dataset](https://huggingface.co/datasets/osunlp/TableInstruct), which includes a comprehensive table-based instruction tuning dataset that covers a variety of real-world tables and realistic tasks. We include 14 datasets of 11 tasks in total. Check out the dataset card for more details. ## Training Procedure The models are fine-tuned with the TableInstruct dataset using LongLoRA (7B), fully fine-tuning version as the base model, which replaces the vanilla attention mechanism of the original Llama-2 (7B) with shift short attention. The training takes 9 days on a 48*A100 cluster. Check out our paper for more details. ## Evaluation The models are evaluated on 8 in-domain datasets of 8 tasks and 6 out-of-domain datasets of 4 tasks. ## Usage You can use the models through Huggingface's Transformers library. Check our Github repo for more advanced use: [https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) ## Prompt Format ``` Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input} ### Question: {question} ### Response: ``` ## Citation If you use the models, data, or code from this project, please cite the original paper: ``` @misc{zhang2023tablellama, title={TableLlama: Towards Open Large Generalist Models for Tables}, author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun}, year={2023}, eprint={2311.09206}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# TableLlama:面向表格任务的开源通用大语言模型(Large Language Model) ## 项目主页 [https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) ## 论文 [https://arxiv.org/abs/2311.09206](https://arxiv.org/abs/2311.09206) ## 模型 [https://huggingface.co/osunlp/TableLlama/](https://huggingface.co/osunlp/TableLlama/) ## 代码 [https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) ## 简介 我们推出了TableLlama,一款专为各类表格任务定制的开源通用大语言模型。TableLlama基于TableInstruct数据集(TableInstruct Dataset)进行微调,该数据集是一套经过精心整理的表格类指令微调数据集。TableLlama在260万条表格任务数据上完成微调,支持最长8K的上下文处理! ## 模型 🤗 [TableLlama-7B](https://huggingface.co/osunlp/TableLlama/) ## 数据集 本模型基于🤗 [TableInstruct数据集(TableInstruct Dataset)](https://huggingface.co/datasets/osunlp/TableInstruct) 训练,该数据集是一套覆盖广泛的表格类指令微调数据集,涵盖多种真实场景表格与贴合实际的任务。本数据集共包含11类任务对应的14个子数据集,更多细节可查阅数据集卡片。 ## 训练流程 本模型以完整微调版本的LongLoRA(7B)作为基础模型,将原始Llama-2(7B)的标准注意力机制替换为移位短注意力(shift short attention),并基于TableInstruct数据集进行微调。训练过程在搭载48块A100的GPU集群上耗时9天,更多细节可参阅原论文。 ## 评估 本模型在8类任务对应的8个域内数据集,以及4类任务对应的6个域外数据集上完成了评估。 ## 使用方法 你可以通过Huggingface的Transformers库调用本模型。更多高级使用方式可查阅我们的GitHub仓库:[https://osu-nlp-group.github.io/TableLlama/](https://osu-nlp-group.github.io/TableLlama/) ## 提示词格式 以下是一则描述任务的指令,搭配提供额外上下文的输入,请生成一段恰当的响应以完成该请求。 ### 指令: {instruction} ### 输入: {input} ### 问题: {question} ### 响应: ## 引用 若你使用本项目的模型、数据集或代码,请引用原论文: @misc{zhang2023tablellama, title={TableLlama: Towards Open Large Generalist Models for Tables}, author={Tianshu Zhang and Xiang Yue and Yifei Li and Huan Sun}, year={2023}, eprint={2311.09206}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作