tantara/Nemotron-Personas-Korea-Qwen3-0.6B-embeddings

Name: tantara/Nemotron-Personas-Korea-Qwen3-0.6B-embeddings
Creator: tantara
Published: 2026-04-27 04:39:59
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/tantara/Nemotron-Personas-Korea-Qwen3-0.6B-embeddings

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是Nemotron-Personas-Korea数据集的嵌入表示，使用Qwen3-Embedding-0.6B模型计算生成。它包含源数据集（nvidia/Nemotron-Personas-Korea）前100万行数据的嵌入向量，嵌入维度为1024。嵌入的列包括专业角色、体育角色、艺术角色、旅行角色、烹饪角色、家庭角色、个人角色、文化背景、技能与专长、爱好与兴趣、职业目标与抱负、性别、年龄、婚姻状况、兵役状况、家庭类型、住房类型、教育水平、学士学位领域、职业、地区、省份和国家等。数据集包含两列：uuid（主键，用于连接回源数据集）和embedding（1024维浮点数列表表示的嵌入向量），适用于语义搜索、相似性计算等NLP任务。

This dataset is an embedding representation of the Nemotron-Personas-Korea dataset, computed using the Qwen3-Embedding-0.6B model. It contains embeddings for the first 1 million rows of the source dataset (nvidia/Nemotron-Personas-Korea), with an embedding dimension of 1024. The columns embedded include professional_persona, sports_persona, arts_persona, travel_persona, culinary_persona, family_persona, persona, cultural_background, skills_and_expertise, hobbies_and_interests, career_goals_and_ambitions, sex, age, marital_status, military_status, family_type, housing_type, education_level, bachelors_field, occupation, district, province, and country. The dataset consists of two columns: uuid (primary key for joining back to the source dataset) and embedding (a list of float32 representing the 1024-dim embedding vector), suitable for NLP tasks such as semantic search and similarity computation.

提供机构：

tantara

5,000+

优质数据集

54 个

任务类型

进入经典数据集