five

huaXiaKyrie/deltalora-memory-multisource-seed42

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/huaXiaKyrie/deltalora-memory-multisource-seed42
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Delta-LoRA Memory Multisource Seed42 license: other language: - en tags: - long-context - question-answering - dialogue - multi-hop - temporal-reasoning size_categories: - 100K<n<1M --- # Delta-LoRA Memory Multisource Seed42 This repository contains a mixed memory-training corpus built for Delta-LoRA long-context and online-memory experiments. Files: - `deltalora_memory_multisource_large_seed42.jsonl`: full mixed corpus - `deltalora_memory_multisource_large_seed42.jsonl.summary.json`: summary for the full corpus - `deltalora_memory_multisource_6k_seed42.jsonl`: stratified 6k subset for quick training - `deltalora_memory_multisource_6k_seed42.jsonl.summary.json`: summary for the 6k subset Source mixture in the full corpus: - `convomem`: 16,000 - `hotpotqa`: 90,447 - `2wikimultihopqa`: 167,454 - `musique`: 39,876 - `quality`: 2,523 - `timeqa`: 28,989 - `squad_v2`: 130,319 - `quac`: 11,567 - `doc2dial`: 3,474 Total examples: `490,649` The 6k subset is stratified as: - `convomem`: 600 - `hotpotqa`: 600 - `2wikimultihopqa`: 1,200 - `musique`: 600 - `quality`: 600 - `timeqa`: 600 - `squad_v2`: 600 - `quac`: 600 - `doc2dial`: 600 Notes: - `qasper` was intentionally excluded from this release. - The dataset is normalized into a chat-style `messages` format for episode-style memory training. - Licenses differ across original sources; please check the upstream datasets before reuse or redistribution. Upstream sources used in this release: - `Salesforce/ConvoMem` - `hotpotqa/hotpot_qa` - `ohjoonhee/2WikiMultihopQA` - `voidful/MuSiQue` - `emozilla/quality` - `hugosousa/TimeQA` - `rajpurkar/squad_v2` - `allenai/quac` - `IBM/doc2dial`
提供机构:
huaXiaKyrie
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作