McGill-NLP/llm2vec-gen-tulu

Name: McGill-NLP/llm2vec-gen-tulu
Creator: McGill-NLP
Published: 2026-03-03 03:59:25
License: 暂无描述

Hugging Face2026-03-03 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/McGill-NLP/llm2vec-gen-tulu

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit configs: - config_name: default data_files: - split: original path: data/original-* - split: Qwen3_06B path: data/Qwen3_06B-* - split: Qwen3_17B path: data/Qwen3_17B-* - split: Qwen3_4B path: data/Qwen3_4B-* - split: Qwen3_8B path: data/Qwen3_8B-* - split: Qwen25_05B_Instruct path: data/Qwen25_05B_Instruct-* - split: Qwen25_15B_Instruct path: data/Qwen25_15B_Instruct-* - split: Qwen25_3B_Instruct path: data/Qwen25_3B_Instruct-* - split: Qwen25_7B_Instruct path: data/Qwen25_7B_Instruct-* - split: Llama_32_1B_Instruct path: data/Llama_32_1B_Instruct-* - split: Llama_32_3B_Instruct path: data/Llama_32_3B_Instruct-* - split: Llama_31_8B_Instruct path: data/Llama_31_8B_Instruct-* - split: Gemini path: data/Gemini-* dataset_info: features: - name: id dtype: string - name: question dtype: string - name: answer dtype: string splits: - name: original num_bytes: 1890392308 num_examples: 806413 - name: Qwen3_06B num_bytes: 1595501945 num_examples: 806413 - name: Qwen3_17B num_bytes: 1782760153 num_examples: 806413 - name: Qwen3_4B num_bytes: 1771856766 num_examples: 806413 - name: Qwen3_8B num_bytes: 1782711169 num_examples: 806413 - name: Qwen25_05B_Instruct num_bytes: 1764258687 num_examples: 806413 - name: Qwen25_15B_Instruct num_bytes: 1712059082 num_examples: 806413 - name: Qwen25_3B_Instruct num_bytes: 1782428614 num_examples: 806413 - name: Qwen25_7B_Instruct num_bytes: 1794035769 num_examples: 806413 - name: Llama_32_1B_Instruct num_bytes: 1744159741 num_examples: 806413 - name: Llama_32_3B_Instruct num_bytes: 1731369502 num_examples: 806413 - name: Llama_31_8B_Instruct num_bytes: 1753929388 num_examples: 806413 - name: Gemini num_bytes: 2468679692 num_examples: 803801 download_size: 14197931355 dataset_size: 23574142816 --- # LLM2Vec-Gen The dataset consists of generations based on the Tulu-3 SFT data ([https://huggingface.co/datasets/allenai/tulu-3-sft-mixture](allenai/tulu-3-sft-mixture)). These generations are intended to be used for training LLM2Vec-Gen models, serving as the target output for queries. This dataset consists of various splits. Each split corresponds to responses generated by a specific LLM, e.g., Qwen3-4B. The "original" split refers to the original Tulu-3 responses. Each instance in split `M` typically includes: - `id`: The original id. - `question`: The original query. - `answer`: The text generated by the model `M`. ## Usage You can load the dataset using the Hugging Face datasets library. ``` python from datasets import load_dataset dataset = load_dataset("McGill-NLP/llm2vec-gen-tulu", split="Qwen3_4B") ```

提供机构：

McGill-NLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集