electricsheepafrica/africa-commuting-zones
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-commuting-zones
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- en
license: cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- tabular-classification
- tabular-regression
task_ids: []
tags:
- africa
- humanitarian
- hdx
- electric-sheep-africa
- economics
- social-media-data
- afg
- alb
- dza
- asm
- and
pretty_name: "Facebook Commuting Zones"
dataset_info:
splits:
- name: train
num_examples: 5232
- name: test
num_examples: 1308
---
# Facebook Commuting Zones
**Publisher:** AI for Good at Meta · **Source:** [HDX](https://data.humdata.org/dataset/commuting-zones) · **License:** `cc-by` · **Updated:** 2026-03-26
---
## Abstract
Commuting zones are geographic areas where people live and work and are useful for understanding local economies, as well as how they differ from traditional boundaries. Learn more here: https://ai.meta.com/ai-for-good/datasets/commuting-zones/
Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `cz_gen_ds` column(s). Geographic scope: **AFG, ALB, DZA, ASM, AND, AGO, AIA, ATG, and 208 others**.
*Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).*
---
## Dataset Characteristics
| | |
|---|---|
| **Domain** | Demographics and population |
| **Unit of observation** | First-level administrative unit observations |
| **Rows (total)** | 6,541 |
| **Columns** | 12 (4 numeric, 7 categorical, 1 datetime) |
| **Train split** | 5,232 rows |
| **Test split** | 1,308 rows |
| **Geographic scope** | AFG, ALB, DZA, ASM, AND, AGO, AIA, ATG, and 208 others |
| **Publisher** | AI for Good at Meta |
| **HDX last updated** | 2026-03-26 |
---
## Variables
**Geographic** — `region` (Europe, North, Asia), `win_population` (range 1924.466–4442659.362), `country` (United States, Australia, France), `geography` (POLYGON ((-73.171737 48.196908, -73.836874 48.251112, -74.101395 48.434243, -73.454396 49.503114, -73.294749 49.628049, -72.35835 50.498351, -70.762395 52.413453, -70.482238 52.039537, -70.622205 51.648719, -71.472274 49.553803, -71.500548 49.498663, -71.574685 49.381618, -72.025571 48.846201, -72.073891 48.707361, -72.238414 48.71652, -72.336656 48.753356, -72.346522 48.763222, -72.4168 48.765866, -72.650907 48.680679, -73.171737 48.196908)), POLYGON ((-69.680454 45.151859, -69.731672 45.19535, -69.785801 45.245749, -69.826246 45.359338, -69.83213 45.366738, -69.870738 45.518099, -69.998347 45.65225, -70.28576 45.375852, -70.527884 45.509149, -70.520473 45.5353, -70.481191 45.703096, -70.411773 45.835644, -70.328549 45.887897, -70.01939 45.95313, -69.895301 46.123478, -69.711148 46.228322, -69.317316 46.328557, -69.286926 46.329558, -69.280901 46.326627, -69.213239 46.212382, -69.219943 45.861754, -69.205386 45.795863, -69.21385 45.764633, -69.262376 45.664322, -69.47938 45.414352, -69.515416 45.393873, -69.673058 45.158904, -69.680334 45.151881, -69.680454 45.151859)), MULTIPOLYGON (((-123.220089 49.47107, -123.24931373453539 49.488083639724586, -123.249442 49.501754, -123.261289 49.508269, -123.261165 49.526269, -123.238278 49.550035, -123.22181 49.588994, -123.22507 49.593916, -123.210235 49.615501, -123.207777 49.629049, -123.224913 49.64711, -123.219146 49.655275, -123.181645 49.659124, -123.17921082528545 49.661029312082306, -123.158802 49.677004, -123.183761 49.684799, -123.20940570245989 49.679209467251724, -123.218515 49.677224, -123.25705 49.664083, -123.259265 49.637214, -123.25254 49.630291, -123.255475 49.609911, -123.270061 49.585311, -123.309097 49.583662, -123.344199 49.569358, -123.357735 49.554037, -123.36809580995293 49.55723449666206, -123.395381 49.573119, -123.525542 49.753351, -123.546391 49.80277, -123.55304 49.827557, -123.678378 50.086184, -123.759908 50.336321, -123.8216 50.43486, -123.918906 50.973352, -124.121268 51.689487, -122.622706 51.096108, -122.204561 50.115342, -122.220455 49.893439, -122.296483 49.843913, -122.349162 49.834274, -122.512879 49.771166, -122.683251 49.738348, -122.732661 49.70182, -122.824177 49.621934, -122.912518 49.552721, -122.976722 49.539112, -123.098239 49.488244, -123.220089 49.47107)), ((-123.29607199956644 49.51530470051546, -123.3276295166913 49.533676406383805, -123.327861 49.534009, -123.308961 49.549645, -123.295091 49.547005, -123.292206 49.525969, -123.283769 49.52351, -123.29607199956644 49.51530470051546)))).
**Identifier / Metadata** — `fbcz_id` (North1271, North743, North722), `name` (springfield, greenville, columbus), `fbcz_id_num` (range 100242.0–600967.0), `esa_source` (HDX), `esa_processed` (2026-04-16).
**Other** — `cz_gen_ds`, `win_roads_km` (range 377.3358–60370392.52), `area` (range 1.3241–254948932.0).
---
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-commuting-zones")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
```
---
## Schema
| Column | Type | Null % | Range / Sample Values |
|---|---|---|---|
| `region` | object | 0.0% | Europe, North, Asia |
| `fbcz_id` | object | 0.0% | North1271, North743, North722 |
| `name` | object | 0.0% | springfield, greenville, columbus |
| `fbcz_id_num` | int64 | 0.0% | 100242.0 – 600967.0 (mean 317469.9882) |
| `cz_gen_ds` | datetime64[ns] | 0.0% | |
| `win_population` | float64 | 0.0% | 1924.466 – 4442659.362 (mean 672256.6153) |
| `win_roads_km` | float64 | 0.0% | 377.3358 – 60370392.52 (mean 159459.6279) |
| `area` | float64 | 0.0% | 1.3241 – 254948932.0 (mean 54115.3705) |
| `country` | object | 0.0% | United States, Australia, France |
| `geography` | object | 0.0% | POLYGON ((-73.171737 48.196908, -73.836874 48.251112, -74.101395 48.434243, -73.454396 49.503114, -73.294749 49.628049, -72.35835 50.498351, -70.762395 52.413453, -70.482238 52.039537, -70.622205 51.648719, -71.472274 49.553803, -71.500548 49.498663, -71.574685 49.381618, -72.025571 48.846201, -72.073891 48.707361, -72.238414 48.71652, -72.336656 48.753356, -72.346522 48.763222, -72.4168 48.765866, -72.650907 48.680679, -73.171737 48.196908)), POLYGON ((-69.680454 45.151859, -69.731672 45.19535, -69.785801 45.245749, -69.826246 45.359338, -69.83213 45.366738, -69.870738 45.518099, -69.998347 45.65225, -70.28576 45.375852, -70.527884 45.509149, -70.520473 45.5353, -70.481191 45.703096, -70.411773 45.835644, -70.328549 45.887897, -70.01939 45.95313, -69.895301 46.123478, -69.711148 46.228322, -69.317316 46.328557, -69.286926 46.329558, -69.280901 46.326627, -69.213239 46.212382, -69.219943 45.861754, -69.205386 45.795863, -69.21385 45.764633, -69.262376 45.664322, -69.47938 45.414352, -69.515416 45.393873, -69.673058 45.158904, -69.680334 45.151881, -69.680454 45.151859)), MULTIPOLYGON (((-123.220089 49.47107, -123.24931373453539 49.488083639724586, -123.249442 49.501754, -123.261289 49.508269, -123.261165 49.526269, -123.238278 49.550035, -123.22181 49.588994, -123.22507 49.593916, -123.210235 49.615501, -123.207777 49.629049, -123.224913 49.64711, -123.219146 49.655275, -123.181645 49.659124, -123.17921082528545 49.661029312082306, -123.158802 49.677004, -123.183761 49.684799, -123.20940570245989 49.679209467251724, -123.218515 49.677224, -123.25705 49.664083, -123.259265 49.637214, -123.25254 49.630291, -123.255475 49.609911, -123.270061 49.585311, -123.309097 49.583662, -123.344199 49.569358, -123.357735 49.554037, -123.36809580995293 49.55723449666206, -123.395381 49.573119, -123.525542 49.753351, -123.546391 49.80277, -123.55304 49.827557, -123.678378 50.086184, -123.759908 50.336321, -123.8216 50.43486, -123.918906 50.973352, -124.121268 51.689487, -122.622706 51.096108, -122.204561 50.115342, -122.220455 49.893439, -122.296483 49.843913, -122.349162 49.834274, -122.512879 49.771166, -122.683251 49.738348, -122.732661 49.70182, -122.824177 49.621934, -122.912518 49.552721, -122.976722 49.539112, -123.098239 49.488244, -123.220089 49.47107)), ((-123.29607199956644 49.51530470051546, -123.3276295166913 49.533676406383805, -123.327861 49.534009, -123.308961 49.549645, -123.295091 49.547005, -123.292206 49.525969, -123.283769 49.52351, -123.29607199956644 49.51530470051546))) |
| `esa_source` | object | 0.0% | HDX |
| `esa_processed` | object | 0.0% | 2026-04-16 |
---
## Numeric Summary
| Column | Min | Max | Mean | Median |
|---|---|---|---|---|
| `fbcz_id_num` | 100242.0 | 600967.0 | 317469.9882 | 301482.0 |
| `win_population` | 1924.466 | 4442659.362 | 672256.6153 | 169167.2207 |
| `win_roads_km` | 377.3358 | 60370392.52 | 159459.6279 | 17895.094 |
| `area` | 1.3241 | 254948932.0 | 54115.3705 | 4312.8595 |
---
## Curation
Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet.
---
## Limitations
- Data originates from AI for Good at Meta and has not been independently validated by ESA.
- Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection.
- This dataset spans 216 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability.
- Refer to the [original HDX dataset page](https://data.humdata.org/dataset/commuting-zones) for the publisher's own methodology notes and caveats.
---
## Citation
```bibtex
@dataset{hdx_africa_commuting_zones,
title = {Facebook Commuting Zones},
author = {AI for Good at Meta},
year = {2026},
url = {https://data.humdata.org/dataset/commuting-zones},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
```
---
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
### 数据集元数据
- 标注创建者:无标注
- 语言采集来源:现有公开资源
- 语言:英语
- 许可协议:CC BY 4.0
- 多语言属性:单语言
- 样本规模:1000~10000条
- 源数据集:原创数据集
- 任务类别:表格分类、表格回归
- 任务子项:无
- 标签:非洲、人道主义、人类数据交换平台(HDX)、Electric Sheep Africa、经济学、社交媒体数据、AFG、ALB、DZA、ASM、AND
- 友好名称:Facebook通勤区数据集
# Facebook通勤区数据集
**发布方**:Meta公益AI团队(AI for Good at Meta) · **来源**:[人类数据交换平台(HDX)](https://data.humdata.org/dataset/commuting-zones) · **许可协议**:CC BY · **更新时间**:2026-03-26
### 摘要
通勤区是指人们生活与工作的地理区域,其对于理解本地经济格局,以及对比其与传统行政边界的差异具有重要价值。更多信息请访问:https://ai.meta.com/ai-for-good/datasets/commuting-zones/
本数据集的每一行均代表一级行政单元的观测数据。时间覆盖范围由`cz_gen_ds`列标注。地理覆盖范围:**AFG、ALB、DZA、ASM、AND、AGO、AIA、ATG 等共计216个国家/地区**。
*本数据集已由 [Electric Sheep Africa](https://huggingface.co/electricsheepafrica) 整理为适配机器学习的Parquet格式。*
### 数据集特征
| 指标 | 详情 |
|---|---|
| **所属领域** | 人口与人口统计学 |
| **观测单元** | 一级行政单元 |
| **总样本行数** | 6541条 |
| **列数** | 12列(4列数值型、7列分类型、1列日期型) |
| **训练集样本量** | 5232条 |
| **测试集样本量** | 1308条 |
| **地理覆盖范围** | AFG、ALB、DZA、ASM、AND、AGO、AIA、ATG 等共计216个国家/地区 |
| **发布方** | Meta公益AI团队(AI for Good at Meta) |
| **HDX最后更新时间** | 2026-03-26 |
### 变量说明
#### 地理类变量
- `region`:区域类别(可选值:欧洲、北美、亚洲等)
- `win_population`:人口规模(取值范围:1924.466–4442659.362)
- `country`:所属国家(可选值:美国、澳大利亚、法国等)
- `geography`:地理空间多边形/多面性坐标(示例格式:POLYGON ((-73.171737 48.196908, ...)))
#### 标识符与元数据类变量
- `fbcz_id`:通勤区唯一标识(示例值:North1271、North743、North722)
- `name`:通勤区名称(示例值:斯普林菲尔德、格林维尔、哥伦布)
- `fbcz_id_num`:数值型通勤区ID(取值范围:100242.0–600967.0)
- `esa_source`:数据来源(固定为HDX)
- `esa_processed`:数据整理时间(格式:YYYY-MM-DD,示例值:2026-04-16)
#### 其他变量
- `cz_gen_ds`:数据集生成时间相关字段
- `win_roads_km`:道路总里程(单位:千米,取值范围:377.3358–60370392.52)
- `area`:区域面积(取值范围:1.3241–254948932.0)
### 快速上手
以下为快速加载该数据集的示例Python代码:
python
from datasets import load_dataset
ds = load_dataset("electricsheepafrica/africa-commuting-zones")
train = ds["train"].to_pandas()
test = ds["test"].to_pandas()
print(train.shape)
train.head()
### 数据Schema
| 列名 | 数据类型 | 空值占比 | 取值范围/示例值 |
|---|---|---|---|
| `region` | 字符串型 | 0.0% | 欧洲、北美、亚洲 |
| `fbcz_id` | 字符串型 | 0.0% | North1271、North743、North722 |
| `name` | 字符串型 | 0.0% | 斯普林菲尔德、格林维尔、哥伦布 |
| `fbcz_id_num` | 64位整型 | 0.0% | 100242.0 – 600967.0(均值:317469.9882) |
| `cz_gen_ds` | 日期时间型[ns] | 0.0% | 无 |
| `win_population` | 64位浮点型 | 0.0% | 1924.466 – 4442659.362(均值:672256.6153) |
| `win_roads_km` | 64位浮点型 | 0.0% | 377.3358 – 60370392.52(均值:159459.6279) |
| `area` | 64位浮点型 | 0.0% | 1.3241 – 254948932.0(均值:54115.3705) |
| `country` | 字符串型 | 0.0% | 美国、澳大利亚、法国 |
| `geography` | 字符串型 | 0.0% | 地理空间多边形坐标(示例格式见前文) |
| `esa_source` | 字符串型 | 0.0% | HDX |
| `esa_processed` | 字符串型 | 0.0% | 2026-04-16 |
### 数值型变量统计摘要
| 列名 | 最小值 | 最大值 | 均值 | 中位数 |
|---|---|---|---|---|
| `fbcz_id_num` | 100242.0 | 600967.0 | 317469.9882 | 301482.0 |
| `win_population` | 1924.466 | 4442659.362 | 672256.6153 | 169167.2207 |
| `win_roads_km` | 377.3358 | 60370392.52 | 159459.6279 | 17895.094 |
| `area` | 1.3241 | 254948932.0 | 54115.3705 | 4312.8595 |
### 数据整理流程
原始数据通过CKAN API从HDX下载,并转换为Parquet格式。所有列名均转为小写并标准化为蛇形命名法(snake_case)。通用缺失值标记(如`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。根据解析成功率(阈值>85%),将1列从字符串类型转换为数值型或日期时间型。数据集以80:20的比例划分为训练集与测试集,使用固定随机种子42进行划分,并以Snappy压缩的Parquet格式存储。
### 数据集局限性
1. 本数据集源自Meta公益AI团队,尚未由Electric Sheep Africa进行独立验证。
2. 自动化清洗流程无法修正原始数据中的错误报告值、定义不一致问题,或原始采集阶段的抽样偏差。
3. 本数据集覆盖216个国家/地区,不同国家的地理与方法论差异可能影响跨国家比较的合理性。
4. 如需了解发布方的方法说明与注意事项,请参考[原始HDX数据集页面](https://data.humdata.org/dataset/commuting-zones)。
### 引用格式
以下为该数据集的标准BibTeX引用格式:
bibtex
@dataset{hdx_africa_commuting_zones,
title = {Facebook Commuting Zones},
author = {AI for Good at Meta},
year = {2026},
url = {https://data.humdata.org/dataset/commuting-zones},
note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)}
}
*[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica



