five

electricsheepafrica/africa-commuting-zones

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-commuting-zones
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - economics - social-media-data - afg - alb - dza - asm - and pretty_name: "Facebook Commuting Zones" dataset_info: splits: - name: train num_examples: 5232 - name: test num_examples: 1308 --- # Facebook Commuting Zones **Publisher:** AI for Good at Meta · **Source:** [HDX](https://data.humdata.org/dataset/commuting-zones) · **License:** `cc-by` · **Updated:** 2026-03-26 --- ## Abstract Commuting zones are geographic areas where people live and work and are useful for understanding local economies, as well as how they differ from traditional boundaries. Learn more here: https://ai.meta.com/ai-for-good/datasets/commuting-zones/ Each row in this dataset represents first-level administrative unit observations. Temporal coverage is indicated by the `cz_gen_ds` column(s). Geographic scope: **AFG, ALB, DZA, ASM, AND, AGO, AIA, ATG, and 208 others**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Demographics and population | | **Unit of observation** | First-level administrative unit observations | | **Rows (total)** | 6,541 | | **Columns** | 12 (4 numeric, 7 categorical, 1 datetime) | | **Train split** | 5,232 rows | | **Test split** | 1,308 rows | | **Geographic scope** | AFG, ALB, DZA, ASM, AND, AGO, AIA, ATG, and 208 others | | **Publisher** | AI for Good at Meta | | **HDX last updated** | 2026-03-26 | --- ## Variables **Geographic** — `region` (Europe, North, Asia), `win_population` (range 1924.466–4442659.362), `country` (United States, Australia, France), `geography` (POLYGON ((-73.171737 48.196908, -73.836874 48.251112, -74.101395 48.434243, -73.454396 49.503114, -73.294749 49.628049, -72.35835 50.498351, -70.762395 52.413453, -70.482238 52.039537, -70.622205 51.648719, -71.472274 49.553803, -71.500548 49.498663, -71.574685 49.381618, -72.025571 48.846201, -72.073891 48.707361, -72.238414 48.71652, -72.336656 48.753356, -72.346522 48.763222, -72.4168 48.765866, -72.650907 48.680679, -73.171737 48.196908)), POLYGON ((-69.680454 45.151859, -69.731672 45.19535, -69.785801 45.245749, -69.826246 45.359338, -69.83213 45.366738, -69.870738 45.518099, -69.998347 45.65225, -70.28576 45.375852, -70.527884 45.509149, -70.520473 45.5353, -70.481191 45.703096, -70.411773 45.835644, -70.328549 45.887897, -70.01939 45.95313, -69.895301 46.123478, -69.711148 46.228322, -69.317316 46.328557, -69.286926 46.329558, -69.280901 46.326627, -69.213239 46.212382, -69.219943 45.861754, -69.205386 45.795863, -69.21385 45.764633, -69.262376 45.664322, -69.47938 45.414352, -69.515416 45.393873, -69.673058 45.158904, -69.680334 45.151881, -69.680454 45.151859)), MULTIPOLYGON (((-123.220089 49.47107, -123.24931373453539 49.488083639724586, -123.249442 49.501754, -123.261289 49.508269, -123.261165 49.526269, -123.238278 49.550035, -123.22181 49.588994, -123.22507 49.593916, -123.210235 49.615501, -123.207777 49.629049, -123.224913 49.64711, -123.219146 49.655275, -123.181645 49.659124, -123.17921082528545 49.661029312082306, -123.158802 49.677004, -123.183761 49.684799, -123.20940570245989 49.679209467251724, -123.218515 49.677224, -123.25705 49.664083, -123.259265 49.637214, -123.25254 49.630291, -123.255475 49.609911, -123.270061 49.585311, -123.309097 49.583662, -123.344199 49.569358, -123.357735 49.554037, -123.36809580995293 49.55723449666206, -123.395381 49.573119, -123.525542 49.753351, -123.546391 49.80277, -123.55304 49.827557, -123.678378 50.086184, -123.759908 50.336321, -123.8216 50.43486, -123.918906 50.973352, -124.121268 51.689487, -122.622706 51.096108, -122.204561 50.115342, -122.220455 49.893439, -122.296483 49.843913, -122.349162 49.834274, -122.512879 49.771166, -122.683251 49.738348, -122.732661 49.70182, -122.824177 49.621934, -122.912518 49.552721, -122.976722 49.539112, -123.098239 49.488244, -123.220089 49.47107)), ((-123.29607199956644 49.51530470051546, -123.3276295166913 49.533676406383805, -123.327861 49.534009, -123.308961 49.549645, -123.295091 49.547005, -123.292206 49.525969, -123.283769 49.52351, -123.29607199956644 49.51530470051546)))). **Identifier / Metadata** — `fbcz_id` (North1271, North743, North722), `name` (springfield, greenville, columbus), `fbcz_id_num` (range 100242.0–600967.0), `esa_source` (HDX), `esa_processed` (2026-04-16). **Other** — `cz_gen_ds`, `win_roads_km` (range 377.3358–60370392.52), `area` (range 1.3241–254948932.0). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-commuting-zones") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `region` | object | 0.0% | Europe, North, Asia | | `fbcz_id` | object | 0.0% | North1271, North743, North722 | | `name` | object | 0.0% | springfield, greenville, columbus | | `fbcz_id_num` | int64 | 0.0% | 100242.0 – 600967.0 (mean 317469.9882) | | `cz_gen_ds` | datetime64[ns] | 0.0% | | | `win_population` | float64 | 0.0% | 1924.466 – 4442659.362 (mean 672256.6153) | | `win_roads_km` | float64 | 0.0% | 377.3358 – 60370392.52 (mean 159459.6279) | | `area` | float64 | 0.0% | 1.3241 – 254948932.0 (mean 54115.3705) | | `country` | object | 0.0% | United States, Australia, France | | `geography` | object | 0.0% | POLYGON ((-73.171737 48.196908, -73.836874 48.251112, -74.101395 48.434243, -73.454396 49.503114, -73.294749 49.628049, -72.35835 50.498351, -70.762395 52.413453, -70.482238 52.039537, -70.622205 51.648719, -71.472274 49.553803, -71.500548 49.498663, -71.574685 49.381618, -72.025571 48.846201, -72.073891 48.707361, -72.238414 48.71652, -72.336656 48.753356, -72.346522 48.763222, -72.4168 48.765866, -72.650907 48.680679, -73.171737 48.196908)), POLYGON ((-69.680454 45.151859, -69.731672 45.19535, -69.785801 45.245749, -69.826246 45.359338, -69.83213 45.366738, -69.870738 45.518099, -69.998347 45.65225, -70.28576 45.375852, -70.527884 45.509149, -70.520473 45.5353, -70.481191 45.703096, -70.411773 45.835644, -70.328549 45.887897, -70.01939 45.95313, -69.895301 46.123478, -69.711148 46.228322, -69.317316 46.328557, -69.286926 46.329558, -69.280901 46.326627, -69.213239 46.212382, -69.219943 45.861754, -69.205386 45.795863, -69.21385 45.764633, -69.262376 45.664322, -69.47938 45.414352, -69.515416 45.393873, -69.673058 45.158904, -69.680334 45.151881, -69.680454 45.151859)), MULTIPOLYGON (((-123.220089 49.47107, -123.24931373453539 49.488083639724586, -123.249442 49.501754, -123.261289 49.508269, -123.261165 49.526269, -123.238278 49.550035, -123.22181 49.588994, -123.22507 49.593916, -123.210235 49.615501, -123.207777 49.629049, -123.224913 49.64711, -123.219146 49.655275, -123.181645 49.659124, -123.17921082528545 49.661029312082306, -123.158802 49.677004, -123.183761 49.684799, -123.20940570245989 49.679209467251724, -123.218515 49.677224, -123.25705 49.664083, -123.259265 49.637214, -123.25254 49.630291, -123.255475 49.609911, -123.270061 49.585311, -123.309097 49.583662, -123.344199 49.569358, -123.357735 49.554037, -123.36809580995293 49.55723449666206, -123.395381 49.573119, -123.525542 49.753351, -123.546391 49.80277, -123.55304 49.827557, -123.678378 50.086184, -123.759908 50.336321, -123.8216 50.43486, -123.918906 50.973352, -124.121268 51.689487, -122.622706 51.096108, -122.204561 50.115342, -122.220455 49.893439, -122.296483 49.843913, -122.349162 49.834274, -122.512879 49.771166, -122.683251 49.738348, -122.732661 49.70182, -122.824177 49.621934, -122.912518 49.552721, -122.976722 49.539112, -123.098239 49.488244, -123.220089 49.47107)), ((-123.29607199956644 49.51530470051546, -123.3276295166913 49.533676406383805, -123.327861 49.534009, -123.308961 49.549645, -123.295091 49.547005, -123.292206 49.525969, -123.283769 49.52351, -123.29607199956644 49.51530470051546))) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-16 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `fbcz_id_num` | 100242.0 | 600967.0 | 317469.9882 | 301482.0 | | `win_population` | 1924.466 | 4442659.362 | 672256.6153 | 169167.2207 | | `win_roads_km` | 377.3358 | 60370392.52 | 159459.6279 | 17895.094 | | `area` | 1.3241 | 254948932.0 | 54115.3705 | 4312.8595 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 1 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from AI for Good at Meta and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - This dataset spans 216 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/commuting-zones) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_commuting_zones, title = {Facebook Commuting Zones}, author = {AI for Good at Meta}, year = {2026}, url = {https://data.humdata.org/dataset/commuting-zones}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

### 数据集元数据 - 标注创建者:无标注 - 语言采集来源:现有公开资源 - 语言:英语 - 许可协议:CC BY 4.0 - 多语言属性:单语言 - 样本规模:1000~10000条 - 源数据集:原创数据集 - 任务类别:表格分类、表格回归 - 任务子项:无 - 标签:非洲、人道主义、人类数据交换平台(HDX)、Electric Sheep Africa、经济学、社交媒体数据、AFG、ALB、DZA、ASM、AND - 友好名称:Facebook通勤区数据集 # Facebook通勤区数据集 **发布方**:Meta公益AI团队(AI for Good at Meta) · **来源**:[人类数据交换平台(HDX)](https://data.humdata.org/dataset/commuting-zones) · **许可协议**:CC BY · **更新时间**:2026-03-26 ### 摘要 通勤区是指人们生活与工作的地理区域,其对于理解本地经济格局,以及对比其与传统行政边界的差异具有重要价值。更多信息请访问:https://ai.meta.com/ai-for-good/datasets/commuting-zones/ 本数据集的每一行均代表一级行政单元的观测数据。时间覆盖范围由`cz_gen_ds`列标注。地理覆盖范围:**AFG、ALB、DZA、ASM、AND、AGO、AIA、ATG 等共计216个国家/地区**。 *本数据集已由 [Electric Sheep Africa](https://huggingface.co/electricsheepafrica) 整理为适配机器学习的Parquet格式。* ### 数据集特征 | 指标 | 详情 | |---|---| | **所属领域** | 人口与人口统计学 | | **观测单元** | 一级行政单元 | | **总样本行数** | 6541条 | | **列数** | 12列(4列数值型、7列分类型、1列日期型) | | **训练集样本量** | 5232条 | | **测试集样本量** | 1308条 | | **地理覆盖范围** | AFG、ALB、DZA、ASM、AND、AGO、AIA、ATG 等共计216个国家/地区 | | **发布方** | Meta公益AI团队(AI for Good at Meta) | | **HDX最后更新时间** | 2026-03-26 | ### 变量说明 #### 地理类变量 - `region`:区域类别(可选值:欧洲、北美、亚洲等) - `win_population`:人口规模(取值范围:1924.466–4442659.362) - `country`:所属国家(可选值:美国、澳大利亚、法国等) - `geography`:地理空间多边形/多面性坐标(示例格式:POLYGON ((-73.171737 48.196908, ...))) #### 标识符与元数据类变量 - `fbcz_id`:通勤区唯一标识(示例值:North1271、North743、North722) - `name`:通勤区名称(示例值:斯普林菲尔德、格林维尔、哥伦布) - `fbcz_id_num`:数值型通勤区ID(取值范围:100242.0–600967.0) - `esa_source`:数据来源(固定为HDX) - `esa_processed`:数据整理时间(格式:YYYY-MM-DD,示例值:2026-04-16) #### 其他变量 - `cz_gen_ds`:数据集生成时间相关字段 - `win_roads_km`:道路总里程(单位:千米,取值范围:377.3358–60370392.52) - `area`:区域面积(取值范围:1.3241–254948932.0) ### 快速上手 以下为快速加载该数据集的示例Python代码: python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-commuting-zones") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ### 数据Schema | 列名 | 数据类型 | 空值占比 | 取值范围/示例值 | |---|---|---|---| | `region` | 字符串型 | 0.0% | 欧洲、北美、亚洲 | | `fbcz_id` | 字符串型 | 0.0% | North1271、North743、North722 | | `name` | 字符串型 | 0.0% | 斯普林菲尔德、格林维尔、哥伦布 | | `fbcz_id_num` | 64位整型 | 0.0% | 100242.0 – 600967.0(均值:317469.9882) | | `cz_gen_ds` | 日期时间型[ns] | 0.0% | 无 | | `win_population` | 64位浮点型 | 0.0% | 1924.466 – 4442659.362(均值:672256.6153) | | `win_roads_km` | 64位浮点型 | 0.0% | 377.3358 – 60370392.52(均值:159459.6279) | | `area` | 64位浮点型 | 0.0% | 1.3241 – 254948932.0(均值:54115.3705) | | `country` | 字符串型 | 0.0% | 美国、澳大利亚、法国 | | `geography` | 字符串型 | 0.0% | 地理空间多边形坐标(示例格式见前文) | | `esa_source` | 字符串型 | 0.0% | HDX | | `esa_processed` | 字符串型 | 0.0% | 2026-04-16 | ### 数值型变量统计摘要 | 列名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `fbcz_id_num` | 100242.0 | 600967.0 | 317469.9882 | 301482.0 | | `win_population` | 1924.466 | 4442659.362 | 672256.6153 | 169167.2207 | | `win_roads_km` | 377.3358 | 60370392.52 | 159459.6279 | 17895.094 | | `area` | 1.3241 | 254948932.0 | 54115.3705 | 4312.8595 | ### 数据整理流程 原始数据通过CKAN API从HDX下载,并转换为Parquet格式。所有列名均转为小写并标准化为蛇形命名法(snake_case)。通用缺失值标记(如`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。根据解析成功率(阈值>85%),将1列从字符串类型转换为数值型或日期时间型。数据集以80:20的比例划分为训练集与测试集,使用固定随机种子42进行划分,并以Snappy压缩的Parquet格式存储。 ### 数据集局限性 1. 本数据集源自Meta公益AI团队,尚未由Electric Sheep Africa进行独立验证。 2. 自动化清洗流程无法修正原始数据中的错误报告值、定义不一致问题,或原始采集阶段的抽样偏差。 3. 本数据集覆盖216个国家/地区,不同国家的地理与方法论差异可能影响跨国家比较的合理性。 4. 如需了解发布方的方法说明与注意事项,请参考[原始HDX数据集页面](https://data.humdata.org/dataset/commuting-zones)。 ### 引用格式 以下为该数据集的标准BibTeX引用格式: bibtex @dataset{hdx_africa_commuting_zones, title = {Facebook Commuting Zones}, author = {AI for Good at Meta}, year = {2026}, url = {https://data.humdata.org/dataset/commuting-zones}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施。尼日利亚拉各斯。*
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作