five

electricsheepafrica/africa-ourairports-mar

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ourairports-mar
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - other task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - aviation - facilities-infrastructure - geodata - hxl - morocco-earthquake - transportation - mar pretty_name: "Airports in Morocco" dataset_info: splits: - name: train num_examples: 36 - name: test num_examples: 9 --- # Airports in Morocco **Publisher:** OurAirports · **Source:** [HDX](https://data.humdata.org/dataset/ourairports-mar) · **License:** `cc-by-igo` · **Updated:** 2026-03-30 --- ## Abstract List of airports in Morocco, with latitude and longitude. Unverified community data from http://ourairports.com/countries/MA/ Each row in this dataset represents first-level administrative unit observations. Data was last updated on HDX on 2026-03-30. Geographic scope: **MAR**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Humanitarian and development data | | **Unit of observation** | First-level administrative unit observations | | **Rows (total)** | 46 | | **Columns** | 24 (7 numeric, 16 categorical, 0 datetime) | | **Train split** | 36 rows | | **Test split** | 9 rows | | **Geographic scope** | MAR | | **Publisher** | OurAirports | | **HDX last updated** | 2026-03-30 | --- ## Variables **Geographic** — `type` (small_airport, large_airport, medium_airport), `latitude_deg` (range 28.4476–35.7317), `longitude_deg` (range -11.1617–-1.926), `country_name` (Morocco, #country +name), `iso_country` (MA, #country +code +iso2) and 5 others. **Temporal** — `last_updated`. **Outcome / Measurement** — `score` (range 0.0–1200.0). **Identifier / Metadata** — `id` (range 3100.0–605980.0), `ident` (#meta +code, MA-0013, GMAZ), `name` (#loc +airport +name, El Jadida Airport, Zagora Airport), `gps_code` (#loc +airport +code +gps, GMMD, GMMO), `icao_code` and 3 others. **Other** — `elevation_ft` (range 10.0–5459.0), `continent` (AF, #region +continent +code), `scheduled_service` (range 0.0–1.0), `wikipedia_link`. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-ourairports-mar") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `id` | float64 | 2.2% | 3100.0 – 605980.0 (mean 125316.0) | | `ident` | object | 0.0% | #meta +code, MA-0013, GMAZ | | `type` | object | 0.0% | small_airport, large_airport, medium_airport | | `name` | object | 0.0% | #loc +airport +name, El Jadida Airport, Zagora Airport | | `latitude_deg` | float64 | 2.2% | 28.4476 – 35.7317 (mean 32.5436) | | `longitude_deg` | float64 | 2.2% | -11.1617 – -1.926 (mean -6.6818) | | `elevation_ft` | float64 | 17.4% | 10.0 – 5459.0 (mean 1166.9737) | | `continent` | object | 0.0% | AF, #region +continent +code | | `country_name` | object | 0.0% | Morocco, #country +name | | `iso_country` | object | 0.0% | MA, #country +code +iso2 | | `region_name` | object | 0.0% | Casablanca-Settat Region, Souss-Massa Region, Marrakech-Safi Region | | `iso_region` | object | 0.0% | MA-06, MA-09, MA-07 | | `local_region` | float64 | 2.2% | 1.0 – 10.0 (mean 5.6444) | | `municipality` | object | 0.0% | Casablanca, Zagora, #loc +municipality +name | | `scheduled_service` | float64 | 2.2% | 0.0 – 1.0 (mean 0.3333) | | `gps_code` | object | 37.0% | #loc +airport +code +gps, GMMD, GMMO | | `icao_code` | object | 41.3% | | | `iata_code` | object | 52.2% | | | `wikipedia_link` | object | 23.9% | | | `keywords` | object | 60.9% | | | `score` | float64 | 2.2% | 0.0 – 1200.0 (mean 375.5556) | | `last_updated` | datetime64[ns, UTC] | 2.2% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `id` | 3100.0 | 605980.0 | 125316.0 | 29845.0 | | `latitude_deg` | 28.4476 | 35.7317 | 32.5436 | 32.8582 | | `longitude_deg` | -11.1617 | -1.926 | -6.6818 | -6.8624 | | `elevation_ft` | 10.0 | 5459.0 | 1166.9737 | 640.0 | | `local_region` | 1.0 | 10.0 | 5.6444 | 6.0 | | `scheduled_service` | 0.0 | 1.0 | 0.3333 | 0.0 | | `score` | 0.0 | 1200.0 | 375.5556 | 50.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) with >80% missing values were removed: `local_code`, `home_link`. 8 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from OurAirports and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `gps_code`, `icao_code`, `iata_code`, `wikipedia_link`, `keywords`. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ourairports-mar) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_ourairports_mar, title = {Airports in Morocco}, author = {OurAirports}, year = {2026}, url = {https://data.humdata.org/dataset/ourairports-mar}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

annotations_creators: 注释创建者:无注释 language_creators: 语言生成方式:从现有公开资源获取 language: 语言:英语 license: 许可协议:知识共享署名4.0(CC BY 4.0) multilinguality: 多语言属性:单语言 size_categories: 样本规模:少于1000条 source_datasets: 源数据集:原创数据集 task_categories: 任务类别:其他 task_ids: 任务子类别:无 tags: 标签:非洲、人道主义、HDX(人道主义数据交换平台)、electric-sheep-africa(Electric Sheep Africa,非洲电羊团队)、航空、设施与基础设施、地理数据、HXL(人道主义交换语言)、morocco-earthquake(摩洛哥地震)、交通、MAR(摩洛哥国家代码) pretty_name: 数据集展示名称:摩洛哥机场列表 dataset_info: splits: - name: train num_examples: 36 - name: test num_examples: 9 # 摩洛哥机场列表 **发布方**:OurAirports · **来源**:[HDX(人道主义数据交换平台)](https://data.humdata.org/dataset/ourairports-mar) · **许可协议**:`cc-by-igo` · **更新时间**:2026-03-30 --- ## 摘要 本数据集收录摩洛哥境内所有机场的经纬度信息,数据为来自http://ourairports.com/countries/MA/ 的未经验证的社区贡献数据。 数据集中每一行代表一级行政区划的观测记录,数据最近一次在HDX平台更新的时间为2026年3月30日,地理覆盖范围:**MAR(摩洛哥国家代码)**。 *本数据集已由[Electric Sheep Africa(非洲电羊团队)](https://huggingface.co/electricsheepafrica)整理为适配机器学习的Parquet格式。* --- ## 数据集特征 | 类别 | 详情 | |---|---| | **应用领域** | 人道主义与发展数据 | | **观测单元** | 一级行政区划观测记录 | | **总样本行数** | 46 | | **字段总数** | 24个(7个数值型字段、16个分类型字段、0个日期时间型字段) | | **训练集划分** | 36行 | | **测试集划分** | 9行 | | **地理覆盖范围** | MAR | | **发布方** | OurAirports | | **HDX平台最后更新时间** | 2026-03-30 | --- ## 变量说明 ### 地理类变量 包含`type`(机场类型:小型机场、大型机场、中型机场)、`latitude_deg`(纬度范围:28.4476–35.7317)、`longitude_deg`(经度范围:-11.1617–-1.926)、`country_name`(国家名称:摩洛哥,#country +name)、`iso_country`(国家ISO2代码:MA,#country +code +iso2)等共7个地理类字段(其余5个字段详见字段结构部分)。 ### 时间类变量 `last_updated`(最后更新时间)。 ### 结果/测量类变量 `score`(得分范围:0.0–1200.0)。 ### 标识符/元数据类变量 包含`id`(编号范围:3100.0–605980.0)、`ident`(元代码,示例值:MA-0013、GMAZ,#meta +code)、`name`(机场名称,示例值:El Jadida Airport、Zagora Airport,#loc +airport +name)、`gps_code`(GPS代码,示例值:GMMD、GMMO,#loc +airport +code +gps)、`icao_code`(ICAO代码)等共6个标识符与元数据字段(其余3个字段详见字段结构部分)。 ### 其他类变量 包含`elevation_ft`(海拔高度,单位:英尺,范围:10.0–5459.0)、`continent`(大洲代码:AF,#region +continent +code)、`scheduled_service`(定期航班服务标识,范围:0.0–1.0)、`wikipedia_link`(维基百科链接)。 --- ## 快速上手示例 python from datasets import load_dataset # 加载数据集 ds = load_dataset("electricsheepafrica/africa-ourairports-mar") # 转换为Pandas DataFrame格式 train_df = ds["train"].to_pandas() test_df = ds["test"].to_pandas() # 输出训练集维度 print(train_df.shape) # 查看训练集前5条数据 train_df.head() --- ## 字段结构 | 字段名 | 数据类型 | 缺失率 | 取值范围/示例值 | |---|---|---|---| | `id` | float64 | 2.2% | 3100.0 – 605980.0(均值:125316.0) | | `ident` | object | 0.0% | #meta +code,示例值:MA-0013、GMAZ | | `type` | object | 0.0% | 小型机场、大型机场、中型机场 | | `name` | object | 0.0% | #loc +airport +name,示例值:El Jadida Airport、Zagora Airport | | `latitude_deg` | float64 | 2.2% | 28.4476 – 35.7317(均值:32.5436) | | `longitude_deg` | float64 | 2.2% | -11.1617 – -1.926(均值:-6.6818) | | `elevation_ft` | float64 | 17.4% | 10.0 – 5459.0(均值:1166.9737) | | `continent` | object | 0.0% | AF(非洲大洲代码),#region +continent +code | | `country_name` | object | 0.0% | 摩洛哥,#country +name | | `iso_country` | object | 0.0% | MA(摩洛哥ISO2国家代码),#country +code +iso2 | | `region_name` | object | 0.0% | 卡萨布兰卡-塞塔特区、苏斯-马萨区、马拉喀什-萨菲区 | | `iso_region` | object | 0.0% | MA-06、MA-09、MA-07 | | `local_region` | float64 | 2.2% | 1.0 – 10.0(均值:5.6444) | | `municipality` | object | 0.0% | 卡萨布兰卡、扎古拉,#loc +municipality +name | | `scheduled_service` | float64 | 2.2% | 0.0 – 1.0(均值:0.3333) | | `gps_code` | object | 37.0% | #loc +airport +code +gps,示例值:GMMD、GMMO | | `icao_code` | object | 41.3% | 无有效取值 | | `iata_code` | object | 52.2% | 无有效取值 | | `wikipedia_link` | object | 23.9% | 无有效取值 | | `keywords` | object | 60.9% | 无有效取值 | | `score` | float64 | 2.2% | 0.0 – 1200.0(均值:375.5556) | | `last_updated` | datetime64[ns, UTC] | 2.2% | 无有效取值 | | `esa_source` | object | 0.0% | 无有效取值 | | `esa_processed` | object | 0.0% | 无有效取值 | --- ## 数值型字段统计摘要 | 字段名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `id` | 3100.0 | 605980.0 | 125316.0 | 29845.0 | | `latitude_deg` | 28.4476 | 35.7317 | 32.5436 | 32.8582 | | `longitude_deg` | -11.1617 | -1.926 | -6.6818 | -6.8624 | | `elevation_ft`(单位:英尺) | 10.0 | 5459.0 | 1166.9737 | 640.0 | | `local_region` | 1.0 | 10.0 | 5.6444 | 6.0 | | `scheduled_service` | 0.0 | 1.0 | 0.3333 | 0.0 | | `score` | 0.0 | 1200.0 | 375.5556 | 50.0 | --- ## 数据整理流程 原始数据通过CKAN应用程序编程接口从HDX平台下载,并转换为Parquet格式。所有字段名均转换为小写,并统一为蛇形命名规范。常见缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)被统一替换为`NaN`。移除了2个缺失率超过80%的字段:`local_code`与`home_link`。基于解析成功率阈值(>85%),将8个字段从字符串类型转换为数值型或日期时间型。数据集以80:20的比例划分为训练集与测试集,采用固定随机种子(42)进行划分,并以Snappy压缩的Parquet格式存储。 --- ## 数据局限性 1. 本数据集源自OurAirports平台,尚未经Electric Sheep Africa(ESA)独立验证。 2. 自动化数据清洗流程无法修正原始数据集中的错报值、定义不一致问题或采样偏差。 3. 以下字段缺失率超过20%,在建模过程中需谨慎使用:`gps_code`、`icao_code`、`iata_code`、`wikipedia_link`、`keywords`。 4. 如需了解发布方的方法论说明与免责声明,请参考[HDX平台原始数据集页面](https://data.humdata.org/dataset/ourairports-mar)。 --- ## 引用格式 bibtex @dataset{hdx_africa_ourairports_mar, title = {Airports in Morocco}, author = {OurAirports}, year = {2026}, url = {https://data.humdata.org/dataset/ourairports-mar}, note = {由Electric Sheep Africa(https://huggingface.co/electricsheepafrica)重新打包适配机器学习场景} } --- *[Electric Sheep Africa(非洲电羊团队)](https://huggingface.co/electricsheepafrica) — 非洲机器学习数据集基础设施平台,总部位于尼日利亚拉各斯。*
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作