eve-esa/synth
收藏Hugging Face2026-04-16 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/eve-esa/synth
下载链接
链接失效反馈官方服务:
资源简介:
EVE-Synth是一个从原始EVE-corpus生成的合成语料库,主要用于以下任务:1. 问答(QA);2. 长问答(Long QA);3. 拒绝问答(Refusal QA);4. 摘要生成(Summarization)。数据集包含四个特征:input(输入提示或问题)、output(生成的或预期的响应)、context(文本块或文本块列表)和file_path(源文档的唯一标识符)。数据集分为四个部分:long_qa(10,000个例子)、refusal(5,000个例子)、summarization(5,000个例子)和qa(5,000个例子)。在合成生成过程中,文档被传递给一个大型语言模型(LLM)并附带详细指令以生成最终输出。
EVE-Synth is a synthetic corpus generated from the original EVE-corpus for the following tasks: 1. QA; 2. Long QA; 3. Refusal QA; 4. Summarization. The dataset includes four features: input (the input prompt or question provided to the model or used to generate the output), output (the generated or expected response corresponding to the input), context (chunks or list of chunks), and file_path (a unique identifier for the source document present within the corpus). The dataset is divided into four splits: long_qa (10,000 examples), refusal (5,000 examples), summarization (5,000 examples), and qa (5,000 examples). In synthetic generation, the documents are passed to an LLM along with detailed instructions to generate the final output.
提供机构:
eve-esa



