five

McGill-NLP/llm2vec-gen-tulu-w-hard-negative

收藏
Hugging Face2026-03-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/McGill-NLP/llm2vec-gen-tulu-w-hard-negative
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: string - name: question dtype: string - name: answer dtype: string - name: negative_question dtype: string - name: negative_answer dtype: string splits: - name: original num_bytes: 2034528644 num_examples: 805467 - name: Qwen3_17B num_bytes: 3660909440 num_examples: 805467 - name: Qwen3_4B num_bytes: 3696200913 num_examples: 805467 - name: Qwen3_8B num_bytes: 3696485153 num_examples: 805467 download_size: 7133393300 dataset_size: 13088124150 configs: - config_name: default data_files: - split: original path: data/original-* - split: Qwen3_17B path: data/Qwen3_17B-* - split: Qwen3_4B path: data/Qwen3_4B-* - split: Qwen3_8B path: data/Qwen3_8B-* --- # LLM2Vec-Gen The dataset consists of generations based on the Tulu-3 SFT data ([https://huggingface.co/datasets/allenai/tulu-3-sft-mixture](allenai/tulu-3-sft-mixture)). These generations are intended to be used for training LLM2Vec-Gen models, serving as the target output for queries. The `negative_question` in this dataset are generated by Gemini. This dataset consists of various splits. Each split corresponds to responses generated by a specific LLM, e.g., Qwen3-4B. The "original" split refers to the original Tulu-3 responses. Each instance in split `M` typically includes: - `id`: The original id. - `question`: The original query. - `answer`: The text generated by the model `M`. - `negative_question`: The negative query generated by Gemini. - `negative_answer`: The text generated by the model `M`. ## Usage You can load the dataset using the Hugging Face datasets library. ``` python from datasets import load_dataset dataset = load_dataset("McGill-NLP/llm2vec-gen-tulu-w-hard-negative", split="Qwen3_4B") ```
提供机构:
McGill-NLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作