apple/CLaRa_multi_stage
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/apple/CLaRa_multi_stage
下载链接
链接失效反馈官方服务:
资源简介:
这是CLaRa论文的官方数据集,包含用于CLaRa模型的训练和评估数据,分为三个主要类别:预训练、指令调优和端到端调优。预训练数据用于压缩器学习,格式为JSONL,包含字段如data_type、question、answers和docs。指令调优数据用于基于压缩文档表示回答问题,格式为JSONL,包含字段如question、docs、gold_answer和answer。端到端调优数据包括在正常和oracle设置下的训练和评估集,格式为JSONL,包含字段如question、answer、docs和pos_index。数据集支持问答和文本生成任务,语言为英语,大小在10G到100G之间。
This is the official dataset for the CLaRa paper which contains training and evaluation data for the CLaRa model, organized into three main categories: pretraining, instruction tuning, and end-to-end tuning. The pretraining data is used for compressor learning, formatted in JSONL with fields such as data_type, question, answers, and docs. The instruction tuning data is used for answering questions based on compressed document representation, formatted in JSONL with fields such as question, docs, gold_answer, and answer. The end-to-end tuning data includes training and evaluation sets in both normal and oracle settings, formatted in JSONL with fields such as question, answer, docs, and pos_index. The dataset supports tasks like question-answering and text-generation, is in English, and ranges in size between 10G and 100G.
提供机构:
apple



