skysight-inc/synthetic-humans-1m
收藏Hugging Face2025-04-15 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/skysight-inc/synthetic-humans-1m
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了100万合成的美国人,基于实际的美国人口统计抽样。它主要用于为大型语言模型提供多样化的响应种子,也可以用于分析目的。数据集中的qualitative_descriptions列包含了大约24亿个token,由Qwen/QwQ-32B模型生成,并带有完整的推理轨迹。数据集的结构包括唯一标识符、人口统计信息(如年龄、性别、地点、职业类别和年薪)、由LLM生成的描述以及从描述中提取的结构化信息。
This dataset contains 1 million synthetic humans based on actual US demographics. It is primarily intended to seed diverse responses for large language models and can also be used for analytical purposes. The qualitative_descriptions column contains approximately 2.4 billion tokens, generated by the Qwen/QwQ-32B model with full reasoning traces. The dataset structure includes a unique identifier, demographic information (such as age, gender, location, occupation category, and median annual wage), LLM-generated descriptions, and structured extractions from those descriptions.
提供机构:
skysight-inc



