LongCite-45k
收藏魔搭社区2026-01-06 更新2024-09-14 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/LongCite-45k
下载链接
链接失效反馈官方服务:
资源简介:
# LongCite-45k
<p align="center">
🤗 <a href="https://huggingface.co/datasets/THUDM/LongCite-45k" target="_blank">[LongCite Dataset] </a> • 💻 <a href="https://github.com/THUDM/LongCite" target="_blank">[Github Repo]</a> • 📃 <a href="https://arxiv.org/abs/2409.02897" target="_blank">[LongCite Paper]</a>
</p>
**LongCite-45k** dataset contains 44,600 long-context QA instances paired with sentence-level citations (both English and Chinese, up to 128,000 words). The data can support training long-context LLMs to generate response and fine-grained citations within a single output.
## Data Example
Each instance in LongCite-45k consists of an instruction, a long context (divided into sentences), a user query, and an answer with sentence-level citations.
<p align="center"><img width="50%" alt="data_instance" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/IF-dQk_aUocEozEwAn6b_.png"></p>
## All Models
We open-sourced following two models trained on LongCite-45k:
|Model|Huggingface Repo|Description|
|---|---|---|
|**LongCite-glm4-9b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongCite-glm4-9b) | **GLM-4-9B** with enhanced citation generation ability |
|**LongCite-llama3.1-8b**| [🤗 Huggingface Repo](https://huggingface.co/THUDM/LongCite-llama3.1-8b) | **Llama-3.1-8B** with enhanced citation generation ability |
## Citation
If you find our work useful, please consider citing LongCite:
```
@article{zhang2024longcite,
title = {LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA}
author={Jiajie Zhang and Yushi Bai and Xin Lv and Wanjun Gu and Danqing Liu and Minhao Zou and Shulin Cao and Lei Hou and Yuxiao Dong and Ling Feng and Juanzi Li},
journal={arXiv preprint arXiv:2409.02897},
year={2024}
}
```
# LongCite-45k
<p align="center">
🤗 <a href="https://huggingface.co/datasets/THUDM/LongCite-45k" target="_blank">[LongCite 数据集]</a> • 💻 <a href="https://github.com/THUDM/LongCite" target="_blank">[Github 代码仓库]</a> • 📃 <a href="https://arxiv.org/abs/2409.02897" target="_blank">[LongCite 研究论文]</a>
</p>
**LongCite-45k** 数据集包含44600个长上下文问答(QA)实例,均配备句级引用(涵盖英文与中文,单实例最大字数可达128000)。该数据集可用于训练长上下文大语言模型(Large Language Model,LLM),使其在单次输出中同时生成回答与细粒度引用。
## 数据示例
LongCite-45k 中的每个实例均由指令、长上下文(已按句子拆分)、用户查询以及带句级引用的回答组成。
<p align="center"><img width="50%" alt="data_instance" src="https://cdn-uploads.huggingface.co/production/uploads/66cdd285c51a915bd5f2d017/IF-dQk_aUocEozEwAn6b_.png"></p>
## 全量模型
我们基于LongCite-45k数据集开源了以下两款模型:
|模型|Huggingface 仓库|模型描述|
|---|---|---|
|**LongCite-glm4-9b**| [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongCite-glm4-9b) | 具备增强型引用生成能力的 **GLM-4-9B** 模型 |
|**LongCite-llama3.1-8b**| [🤗 Huggingface 仓库](https://huggingface.co/THUDM/LongCite-llama3.1-8b) | 具备增强型引用生成能力的 **Llama-3.1-8B** 模型 |
## 引用说明
若您认为本工作对您有所帮助,请引用LongCite相关论文:
@article{zhang2024longcite,
title = {LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA}
author={Jiajie Zhang and Yushi Bai and Xin Lv and Wanjun Gu and Danqing Liu and Minhao Zou and Shulin Cao and Lei Hou and Yuxiao Dong and Ling Feng and Juanzi Li},
journal={arXiv preprint arXiv:2409.02897},
year={2024}
}
提供机构:
maas
创建时间:
2025-07-30



