five

electricsheepafrica/africa-ghana-uneca-education

收藏
Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-ghana-uneca-education
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - education - indicators - literacy - gha pretty_name: "GHANA - Education indicators, UNECA" dataset_info: splits: - name: train num_examples: 43 - name: test num_examples: 10 --- # GHANA - Education indicators, UNECA **Publisher:** United Nations Economic Commission for Africa · **Source:** [HDX](https://data.humdata.org/dataset/ghana-uneca-education) · **License:** `cc-by-igo` · **Updated:** 2024-09-13 --- ## Abstract This dataset contains many indicators in education such as as Net enrolment rate in primary education, Ratio of girls to boys in primary education, etc. The whole list and their description can be find in this link https://bit.ly/2NWP6Z1 Each row in this dataset represents tabular records. Data was last updated on HDX on 2024-09-13. Geographic scope: **GHA**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Education | | **Unit of observation** | Tabular records | | **Rows (total)** | 54 | | **Columns** | 10 (7 numeric, 3 categorical, 0 datetime) | | **Train split** | 43 rows | | **Test split** | 10 rows | | **Geographic scope** | GHA | | **Publisher** | United Nations Economic Commission for Africa | | **HDX last updated** | 2024-09-13 | --- ## Variables **Identifier / Metadata** — `esa_source` (HDX), `esa_processed` (2026-04-11). **Other** — `indicator` (Adult literacy rate - Female (%), Ratio of school attendance rate of orphans to school attendance rate of non orphans (%), Net enrolment rate in secondary education - Male (%)), `2011` (range 0.5–107.1), `2012` (range 0.1–108.2), `2013` (range 0.4–107.4), `2014` (range 0.6–105.4) and 3 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-ghana-uneca-education") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `indicator` | object | 0.0% | Adult literacy rate - Female (%), Ratio of school attendance rate of orphans to school attendance rate of non orphans (%), Net enrolment rate in secondary education - Male (%) | | `2011` | float64 | 40.7% | 0.5 – 107.1 (mean 24.7344) | | `2012` | float64 | 35.2% | 0.1 – 108.2 (mean 33.8971) | | `2013` | float64 | 14.8% | 0.4 – 107.4 (mean 41.7) | | `2014` | float64 | 27.8% | 0.6 – 105.4 (mean 38.3641) | | `2015` | float64 | 29.6% | 0.7 – 108.7 (mean 45.1237) | | `2016` | float64 | 35.2% | 0.7 – 107.3 (mean 42.0857) | | `2017` | float64 | 50.0% | 0.7 – 105.5 (mean 43.3852) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-11 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `2011` | 0.5 | 107.1 | 24.7344 | 6.55 | | `2012` | 0.1 | 108.2 | 33.8971 | 9.3 | | `2013` | 0.4 | 107.4 | 41.7 | 22.25 | | `2014` | 0.6 | 105.4 | 38.3641 | 15.8 | | `2015` | 0.7 | 108.7 | 45.1237 | 43.45 | | `2016` | 0.7 | 107.3 | 42.0857 | 27.3 | | `2017` | 0.7 | 105.5 | 43.3852 | 27.3 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 3 column(s) with >80% missing values were removed: `2010`, `2018`, `2019`. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from United Nations Economic Commission for Africa and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `2011`, `2012`, `2014`, `2015`, `2016`, `2017`. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/ghana-uneca-education) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_ghana_uneca_education, title = {GHANA - Education indicators, UNECA}, author = {United Nations Economic Commission for Africa}, year = {2024}, url = {https://data.humdata.org/dataset/ghana-uneca-education}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
搜集汇总
数据集介绍
main_image_url
构建方式
在非洲教育发展研究的背景下,该数据集由联合国非洲经济委员会(UNECA)发布,原始数据通过HDX平台获取,并由Electric Sheep Africa团队进行系统化处理。构建过程涉及从CKAN API下载原始数据,随后进行标准化清洗,包括统一列名为蛇形命名法、将常见缺失值标记转换为NaN,并移除了缺失率超过80%的冗余列。最终,数据以80/20的比例划分为训练集和测试集,并以Snappy压缩的Parquet格式存储,确保了数据的机器学习就绪状态。
使用方法
在应用该数据集进行教育指标分析时,用户可通过Hugging Face的datasets库直接加载,使用load_dataset函数获取训练集和测试集。数据以Pandas DataFrame格式呈现,便于进行探索性数据分析和统计建模。鉴于部分列存在缺失值,建议在建模前采用适当的插补或处理策略。该数据集适用于回归任务,可用于预测教育指标随时间的变化趋势,或评估不同社会经济因素对教育成果的影响。
背景与挑战
背景概述
在全球化与可持续发展目标框架下,教育指标作为衡量国家人力资本与社会进步的核心维度,日益受到国际组织与政策研究者的重视。非洲加纳教育指标数据集由联合国非洲经济委员会于2024年发布,并由Electric Sheep Africa机构进行机器学习化处理,旨在系统收录该国2011至2017年间多项关键教育统计数据,如小学净入学率、性别平等比率及成人识字率等。该数据集以表格形式呈现,涵盖43条训练样本与10条测试样本,为教育政策评估、区域发展比较及预测建模提供了结构化数据基础,尤其对非洲教育实证研究具有重要参考价值。
当前挑战
该数据集致力于解决教育发展指标的多维回归分析问题,其核心挑战在于如何从有限且存在缺失的时序数据中,准确捕捉加纳教育系统的动态变化规律。具体而言,数据构建过程中面临显著挑战:原始数据存在高比例缺失值,例如2011年至2017年间多个年份指标的缺失率超过20%,且2010年、2018年及2019年数据因缺失过多被移除,这限制了时序连续性与模型泛化能力。此外,指标定义差异与原始收集过程中的潜在报告偏差,可能引入系统性误差,对机器学习模型的稳健性与可解释性构成考验。
常用场景
经典使用场景
在教育发展研究领域,该数据集常被用于构建时间序列回归模型,以分析加纳教育指标的年际变化趋势。研究者通过整合初级教育净入学率、性别平等比率等关键变量,能够系统评估教育政策的实施效果,并预测未来教育发展的潜在轨迹,为区域教育规划提供数据驱动的见解。
解决学术问题
该数据集有效解决了发展经济学中关于教育不平等与人力资本积累的量化研究难题。通过提供标准化的跨国教育指标,它支持学者深入探讨性别差异、孤儿教育机会等社会公平议题,并助力验证教育投入与经济发展之间的因果关系,从而丰富了全球南方国家教育实证研究的文献基础。
实际应用
在实际应用中,该数据集被国际组织与地方政府用于监测联合国可持续发展目标(SDG)中教育相关指标的进展。决策者可依据数据中的识字率、入学率等关键指标,精准识别教育薄弱环节,优化资源分配策略,并制定针对性的干预措施,以推动加纳乃至非洲区域的教育系统改革。
数据集最近研究
最新研究方向
在非洲教育发展研究领域,加纳教育指标数据集正成为探索教育公平与可持续发展目标的前沿工具。该数据集整合了联合国非洲经济委员会提供的多年度指标,如小学净入学率、性别平等比率等,为机器学习模型提供了结构化输入。当前研究热点聚焦于利用时序回归分析预测教育趋势,特别是在资源有限背景下评估性别差异与孤儿教育机会的干预效果。这类工作不仅响应全球教育包容性议程,也为政策制定者提供了数据驱动的决策支持,推动实证研究在非洲教育治理中的深化应用。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务