five

nvidia/Nemotron-Personas-Japan

收藏
Hugging Face2025-12-16 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Nemotron-Personas-Japan
下载链接
链接失效反馈
官方服务:
资源简介:
Nemotron-Personas-Japan是一个基于日本人口统计、地理分布和性格特性分布的合成数据集,旨在捕捉日本人口的多样性和丰富性。它包括100万个记录,每个记录包含6个角色,总共600万个角色。数据集包含22个字段,包括6个角色字段和16个基于官方人口和劳动统计的上下文字段。该数据集由NVIDIA Corporation创建,并使用NeMo Data Designer系统生成,该系统利用了专有的概率图形模型、GPT-OSS-120B模型和一系列验证器和评估器。数据集支持商业使用,并采用CC BY 4.0许可证。它旨在支持主权AI系统的开发,用于训练语言模型和改善合成数据的多样性。

Nemotron-Personas-Japan is a synthetic dataset grounded in real-world demographic, geographic, and personality trait distributions in Japan to capture the diversity and richness of the population. It includes 1 million records with 6 personas per record, totaling 6 million personas. The dataset contains 22 fields, including 6 persona fields and 16 contextual fields based on official demographic and labor statistics. Created by NVIDIA Corporation, it was generated using the NeMo Data Designer system, leveraging a proprietary Probabilistic Graphical Model, the GPT-OSS-120B model, and a set of validators and evaluators. The dataset supports commercial use and is licensed under CC BY 4.0. It is designed to support the development of Sovereign AI systems for training language models and improving the diversity of synthetic data.
提供机构:
nvidia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作