the-economist-baby-names|婴儿名字数据集|社会统计数据集
收藏数据集概述
数据集名称
Whats in a name?
数据集内容
- 包含美国和英国过去143年婴儿名字趋势分析的R脚本。
- 分析名字的流行度、多样性及特定含义的演变。
- 通过可视化和统计指标展示文化变迁。
数据来源
- SSA Baby Names(美国数据)
- ONS Data(英国数据)
- ChatGPT4(用于含义分析)
- Word2Vec(用于含义的维度映射)
数据文件
美国数据
- 文件路径:
output-data/us_names_with_popularity_and_connotations.csv
- 时间范围:1880–2023年
- 列描述:
name
:名字sex
:性别(M为男性,F为女性)n
:该年份中名字的出现次数year
:记录年份per_year
:该年份的总出生人数percent_per_year
:名字出现次数占总出生人数的百分比nchar
:名字的字符数connotation_1
到connotation_5
:名字的五个主要含义flag
:布尔值,表示是否缺少任何含义数据connotation_raw
:原始含义文本intelligence
到tradition
:布尔列,表示名字是否与特定含义类别相关
英国数据
- 文件路径:
output-data/uk_names_with_popularity_and_connotations.csv
- 时间范围:1996–2023年
含义分析
- 通过OpenAI的API调用ChatGPT4获取名字的五个主要含义。
- 含义类别通过手动和LLM识别同义词定义。
注意事项
- 美国数据仅包含每年出现五次及以上的名字。
- 英国数据仅包含每年出现三次及以上的名字。
- 名字的含义可能随时间变化,使用时需谨慎。
联系方式
- Sondre Solstad:sondresolstad@economist.com
引用建议
The Economist and Solstad, S. (corresponding author), 2025. Whats in a name? [online] The Economist. Available at: www.economist.com/interactive/culture/2025/03/20/what-is-in-a-name. First published in the article "The importance of being Earnest", The Economist, March 20th, 2025.

Subway Dataset
该数据集包含了全球多个城市的地铁系统数据,包括车站信息、线路图、列车时刻表、乘客流量等。数据集旨在帮助研究人员和开发者分析和模拟城市交通系统,优化地铁运营和乘客体验。
www.kaggle.com 收录
中国气象数据
本数据集包含了中国2023年1月至11月的气象数据,包括日照时间、降雨量、温度、风速等关键数据。通过这些数据,可以深入了解气象现象对不同地区的影响,并通过可视化工具揭示中国的气温分布、降水情况、风速趋势等。
github 收录
Population and Housing Census of 2007 - Ethiopia
Geographic coverage --------------------------- National coverage Analysis unit --------------------------- Household Person Housing unit Universe --------------------------- The census has counted people on dejure and defacto basis. The dejure population comprises all the persons who belong to a given area at a given time by virtue of usual residence, while under defacto approach people were counted as the residents of the place where they found. In the census, a person is said to be a usual resident of a household (and hence an area) if he/she has been residing in the household continuously for at least six months before the census day or intends to reside in the household for six months or longer. Thus, visitors are not included with the usual (dejure) population. Homeless persons were enumerated in the place where they spent the night on the enumeration day. The 2007 census counted foreign nationals who were residing in the city administration. On the other hand all Ethiopians living abroad were not counted. Kind of data --------------------------- Census/enumeration data [cen] Mode of data collection --------------------------- Face-to-face [f2f] Research instrument --------------------------- Two type sof questionnaires were used to collect census data: i) Short questionnaire ii) Long questionnaire Unlike the previous censuses, the contents of the short and long questionnaires were similar both for the urban and rural areas as well as for the entire city. But the short and the long questionnaires differ by the number of variables they contained. That is, the short questionnaire was used to collect basic data on population characteristics, such as population size, sex, age, language, ethnic group, religion, orphanhood and disability. Whereas the long questionnaire includes information on marital status, education, economic activity, migration, fertility, mortality, as well as housing stocks and conditions in addition to those questions contained in a short questionnaire.
catalog.ihsn.org 收录
TM-Senti
TM-Senti是由伦敦玛丽女王大学开发的一个大规模、远距离监督的Twitter情感数据集,包含超过1.84亿条推文,覆盖了超过七年的时间跨度。该数据集基于互联网档案馆的公开推文存档,可以完全重新构建,包括推文元数据且无缺失推文。数据集内容丰富,涵盖多种语言,主要用于情感分析和文本分类等任务。创建过程中,研究团队精心筛选了表情符号和表情,确保数据集的质量和多样性。该数据集的应用领域广泛,旨在解决社交媒体情感表达的长期变化问题,特别是在表情符号和表情使用上的趋势分析。
arXiv 收录
中国空气质量数据集(2014-2020年)
数据集中的空气质量数据类型包括PM2.5, PM10, SO2, NO2, O3, CO, AQI,包含了2014-2020年全国360个城市的逐日空气质量监测数据。监测数据来自中国环境监测总站的全国城市空气质量实时发布平台,每日更新。数据集的原始文件为CSV的文本记录,通过空间化处理生产出Shape格式的空间数据。数据集包括CSV格式和Shape格式两数数据格式。
国家地球系统科学数据中心 收录