five

cerebras/TAT-QA-Arithmetic-CoT

收藏
Hugging Face2024-08-19 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/cerebras/TAT-QA-Arithmetic-CoT
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 --- # Dataset Information A Chain of Thought (CoT) version of the TAT-QA arithmetic dataset (hosted at https://huggingface.co/datasets/nvidia/ChatQA-Training-Data). The dataset was synthetically generated by prompting Llama3 70B Instruct. The dataset was created as part of our work on Cerebras DocChat - a document-based conversational Q&A model. We observed that initial iterations of our model frequently made errors on arithmetic tasks (such as ConvFinQA) because it was trained on datasets such as TAT-QA where the model must create a final equation in a single shot. We found that the addition of this dataset led to a substantial boost in accuracy (+10 on ConvFinQA). # Acknowledgement This dataset was is a variation of the TAT-QA dataset, and was synthetically generated using Llama 3 70B Instruct. ``` @inproceedings{zhu2021tat, title={TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance}, author={Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng}, booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics}, year={2021} } @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} } ```

许可证:CC-BY-NC-4.0 # 数据集信息 本数据集为TAT-QA算术数据集的思维链(Chain of Thought, CoT)版本,托管于https://huggingface.co/datasets/nvidia/ChatQA-Training-Data。该数据集通过对Llama3 70B Instruct进行提示式生成构建而成,是我们针对Cerebras DocChat——一款基于文档的对话式问答模型——的研究工作的一部分。我们发现,模型的初始迭代版本在算术类任务(如ConvFinQA)上频繁出错,原因在于其训练所用的TAT-QA等数据集要求模型一次性生成最终计算式。实验表明,加入本数据集后,模型在相关任务上的准确率得到了显著提升(在ConvFinQA任务上提升了10个百分点)。 # 致谢 本数据集是TAT-QA数据集的衍生版本,通过Llama 3 70B Instruct合成生成。 @inproceedings{zhu2021tat, title={TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance}, author={Zhu, Fengbin and Lei, Wenqiang and Huang, Youcheng and Wang, Chao and Zhang, Shuo and Lv, Jiancheng and Feng, Fuli and Chua, Tat-Seng}, booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics}, year={2021} } @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} }
提供机构:
cerebras
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作