five

McGill-NLP/llm2vec-gen-tulu

收藏
Hugging Face2026-03-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/McGill-NLP/llm2vec-gen-tulu
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit configs: - config_name: default data_files: - split: original path: data/original-* - split: Qwen3_06B path: data/Qwen3_06B-* - split: Qwen3_17B path: data/Qwen3_17B-* - split: Qwen3_4B path: data/Qwen3_4B-* - split: Qwen3_8B path: data/Qwen3_8B-* - split: Qwen25_05B_Instruct path: data/Qwen25_05B_Instruct-* - split: Qwen25_15B_Instruct path: data/Qwen25_15B_Instruct-* - split: Qwen25_3B_Instruct path: data/Qwen25_3B_Instruct-* - split: Qwen25_7B_Instruct path: data/Qwen25_7B_Instruct-* - split: Llama_32_1B_Instruct path: data/Llama_32_1B_Instruct-* - split: Llama_32_3B_Instruct path: data/Llama_32_3B_Instruct-* - split: Llama_31_8B_Instruct path: data/Llama_31_8B_Instruct-* - split: Gemini path: data/Gemini-* dataset_info: features: - name: id dtype: string - name: question dtype: string - name: answer dtype: string splits: - name: original num_bytes: 1890392308 num_examples: 806413 - name: Qwen3_06B num_bytes: 1595501945 num_examples: 806413 - name: Qwen3_17B num_bytes: 1782760153 num_examples: 806413 - name: Qwen3_4B num_bytes: 1771856766 num_examples: 806413 - name: Qwen3_8B num_bytes: 1782711169 num_examples: 806413 - name: Qwen25_05B_Instruct num_bytes: 1764258687 num_examples: 806413 - name: Qwen25_15B_Instruct num_bytes: 1712059082 num_examples: 806413 - name: Qwen25_3B_Instruct num_bytes: 1782428614 num_examples: 806413 - name: Qwen25_7B_Instruct num_bytes: 1794035769 num_examples: 806413 - name: Llama_32_1B_Instruct num_bytes: 1744159741 num_examples: 806413 - name: Llama_32_3B_Instruct num_bytes: 1731369502 num_examples: 806413 - name: Llama_31_8B_Instruct num_bytes: 1753929388 num_examples: 806413 - name: Gemini num_bytes: 2468679692 num_examples: 803801 download_size: 14197931355 dataset_size: 23574142816 --- # LLM2Vec-Gen The dataset consists of generations based on the Tulu-3 SFT data ([https://huggingface.co/datasets/allenai/tulu-3-sft-mixture](allenai/tulu-3-sft-mixture)). These generations are intended to be used for training LLM2Vec-Gen models, serving as the target output for queries. This dataset consists of various splits. Each split corresponds to responses generated by a specific LLM, e.g., Qwen3-4B. The "original" split refers to the original Tulu-3 responses. Each instance in split `M` typically includes: - `id`: The original id. - `question`: The original query. - `answer`: The text generated by the model `M`. ## Usage You can load the dataset using the Hugging Face datasets library. ``` python from datasets import load_dataset dataset = load_dataset("McGill-NLP/llm2vec-gen-tulu", split="Qwen3_4B") ```
提供机构:
McGill-NLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作