apple/CLaRa_multi_stage

Name: apple/CLaRa_multi_stage
Creator: apple
Published: 2025-12-16 19:34:29
License: 暂无描述

Hugging Face2025-12-16 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/apple/CLaRa_multi_stage

下载链接

链接失效反馈

官方服务：

资源简介：

这是CLaRa论文的官方数据集，包含用于CLaRa模型的训练和评估数据，分为三个主要类别：预训练、指令调优和端到端调优。预训练数据用于压缩器学习，格式为JSONL，包含字段如data_type、question、answers和docs。指令调优数据用于基于压缩文档表示回答问题，格式为JSONL，包含字段如question、docs、gold_answer和answer。端到端调优数据包括在正常和oracle设置下的训练和评估集，格式为JSONL，包含字段如question、answer、docs和pos_index。数据集支持问答和文本生成任务，语言为英语，大小在10G到100G之间。

This is the official dataset for the CLaRa paper which contains training and evaluation data for the CLaRa model, organized into three main categories: pretraining, instruction tuning, and end-to-end tuning. The pretraining data is used for compressor learning, formatted in JSONL with fields such as data_type, question, answers, and docs. The instruction tuning data is used for answering questions based on compressed document representation, formatted in JSONL with fields such as question, docs, gold_answer, and answer. The end-to-end tuning data includes training and evaluation sets in both normal and oracle settings, formatted in JSONL with fields such as question, answer, docs, and pos_index. The dataset supports tasks like question-answering and text-generation, is in English, and ranges in size between 10G and 100G.

提供机构：

apple

5,000+

优质数据集

54 个

任务类型

进入经典数据集