five

electricsheepafrica/africa-world-bank-science-and-technology-indicators-for-federal-republic-of-somalia

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-world-bank-science-and-technology-indicators-for-federal-republic-of-somalia
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - n<1K source_datasets: - original task_categories: - tabular-classification - tabular-regression task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - economics - hxl - indicators - som pretty_name: "Federal Republic of Somalia - Science and Technology" dataset_info: splits: - name: train num_examples: 24 - name: test num_examples: 6 --- # Federal Republic of Somalia - Science and Technology **Publisher:** World Bank Group · **Source:** [HDX](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia) · **License:** `cc-by` · **Updated:** 2025-11-04 --- ## Abstract Contains data from the World Bank's [data portal](http://data.worldbank.org/). There is also a [consolidated country dataset](https://data.humdata.org/dataset/world-bank-combined-indicators-for-federal-republic-of-somalia) on HDX. Technological innovation, often fueled by governments, drives industrial growth and helps raise living standards. Data here aims to shed light on countries technology base: research and development, scientific and technical journal articles, high-technology exports, royalty and license fees, and patents and trademarks. Sources include the UNESCO Institute for Statistics, the U.S. National Science Board, the UN Statistics Division, the International Monetary Fund, and the World Intellectual Property Organization. Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2025-11-04. Geographic scope: **SOM**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Humanitarian and development data | | **Unit of observation** | Country-level aggregates | | **Rows (total)** | 31 | | **Columns** | 8 (2 numeric, 6 categorical, 0 datetime) | | **Train split** | 24 rows | | **Test split** | 6 rows | | **Geographic scope** | SOM | | **Publisher** | World Bank Group | | **HDX last updated** | 2025-11-04 | --- ## Variables **Geographic** — `country_name` (Federal Republic of Somalia, #country+name), `country_iso3` (SOM, #country+code), `year` (range 1984.0–2022.0). **Outcome / Measurement** — `value` (range 0.0–106.49). **Identifier / Metadata** — `indicator_name` (Scientific and technical journal articles, Patent applications, nonresidents, #indicator+name), `indicator_code` (IP.JRN.ARTC.SC, IP.PAT.NRES, #indicator+code), `esa_source` (HDX), `esa_processed` (2026-04-07). --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-science-and-technology-indicators-for-federal-republic-of-somalia") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `country_name` | object | 0.0% | Federal Republic of Somalia, #country+name | | `country_iso3` | object | 0.0% | SOM, #country+code | | `year` | float64 | 3.2% | 1984.0 – 2022.0 (mean 2006.5333) | | `indicator_name` | object | 0.0% | Scientific and technical journal articles, Patent applications, nonresidents, #indicator+name | | `indicator_code` | object | 0.0% | IP.JRN.ARTC.SC, IP.PAT.NRES, #indicator+code | | `value` | float64 | 3.2% | 0.0 – 106.49 (mean 9.7373) | | `esa_source` | object | 0.0% | HDX | | `esa_processed` | object | 0.0% | 2026-04-07 | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `year` | 1984.0 | 2022.0 | 2006.5333 | 2007.5 | | `value` | 0.0 | 106.49 | 9.7373 | 2.32 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 2 column(s) were cast from string to numeric or datetime based on parse-success rate (>85% threshold). The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from World Bank Group and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_world_bank_science_and_technology_indicators_for_federal_republic_of_somalia, title = {Federal Republic of Somalia - Science and Technology}, author = {World Bank Group}, year = {2025}, url = {https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*

--- annotations_creators: - 无标注 language_creators: - 外部获取 language: - 英语 license: CC-BY 4.0 multilinguality: - 单语言 size_categories: - 样本量少于1000 source_datasets: - 原生数据集 task_categories: - 表格分类 - 表格 regression task_ids: [] tags: - 非洲 - 人道主义 - 人道主义数据交换(HDX) - 非洲电羊(Electric Sheep Africa) - 经济学 - 人道主义交换语言(HXL) - 指标 - 索马里(SOM) pretty_name: "索马里联邦共和国——科学与技术" dataset_info: splits: - name: 训练集 num_examples: 24 - name: 测试集 num_examples: 6 --- # 索马里联邦共和国——科学与技术 **发布方:** 世界银行集团 · **来源:** [人道主义数据交换(HDX)](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia) · **许可证:** `CC-BY` · **更新时间:** 2025-11-04 --- ## 摘要 本数据集数据源自世界银行[官方数据门户](http://data.worldbank.org/),同时HDX平台上还提供了一份索马里联邦共和国综合国家数据集[链接](https://data.humdata.org/dataset/world-bank-combined-indicators-for-federal-republic-of-somalia)。 技术创新通常由政府推动,是产业增长与民生改善的核心驱动力。本数据集旨在揭示一国的科技基础实力,涵盖研发投入、科技期刊论文发表量、高技术产品出口额、特许权与许可费收入,以及专利与商标申请量等维度。数据来源包括联合国教科文组织统计研究所、美国国家科学委员会、联合国统计司、国际货币基金组织以及世界知识产权组织。 本数据集每一行均为国家级汇总数据,最近一次在HDX平台的更新时间为2025-11-04,地理覆盖范围为**索马里(SOM)**。 *本数据集已由[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica)整理为适用于机器学习的Parquet格式。* --- ## 数据集特征 | | | |---|---| | **领域** | 人道主义与发展数据 | | **观测单元** | 国家级汇总数据 | | **总样本行数** | 31 | | **列数** | 8列(2个数值型、6个分类型、0个日期时间型) | | **训练集划分** | 24行 | | **测试集划分** | 6行 | | **地理覆盖范围** | 索马里(SOM) | | **发布方** | 世界银行集团 | | **HDX平台最后更新时间** | 2025-11-04 | --- ## 变量说明 **地理类变量**:`country_name`(索马里联邦共和国,#country+name)、`country_iso3`(SOM,#country+code)、`year`(取值范围:1984.0–2022.0)。 **结果/测量类变量**:`value`(取值范围:0.0–106.49)。 **标识/元数据类变量**:`indicator_name`(科技期刊论文、非居民专利申请量,#indicator+name)、`indicator_code`(IP.JRN.ARTC.SC、IP.PAT.NRES,#indicator+code)、`esa_source`(HDX)、`esa_processed`(2026-04-07)。 --- ## 快速上手 python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-world-bank-science-and-technology-indicators-for-federal-republic-of-somalia") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() --- ## 数据结构 | 列名 | 数据类型 | 空值占比 | 取值范围/示例值 | |---|---|---|---| | `country_name` | 对象型(object) | 0.0% | 索马里联邦共和国,#country+name | | `country_iso3` | 对象型(object) | 0.0% | SOM,#country+code | | `year` | float64 | 3.2% | 1984.0 – 2022.0(均值2006.5333) | | `indicator_name` | 对象型(object) | 0.0% | 科技期刊论文、非居民专利申请量,#indicator+name | | `indicator_code` | 对象型(object) | 0.0% | IP.JRN.ARTC.SC、IP.PAT.NRES,#indicator+code | | `value` | float64 | 3.2% | 0.0 – 106.49(均值9.7373) | | `esa_source` | 对象型(object) | 0.0% | HDX | | `esa_processed` | 对象型(object) | 0.0% | 2026-04-07 | --- ## 数值型变量统计摘要 | 列名 | 最小值 | 最大值 | 均值 | 中位数 | |---|---|---|---|---| | `year` | 1984.0 | 2022.0 | 2006.5333 | 2007.5 | | `value` | 0.0 | 106.49 | 9.7373 | 2.32 | --- ## 数据整理流程 原始数据通过CKAN API从HDX平台下载,并转换为Parquet格式。随后对列名进行小写转换并统一为蛇形命名规范;将常见的缺失值标记(`N/A`、`null`、`none`、`-`、`unknown`、`no data`、`#N/A`)统一替换为`NaN`;基于解析成功率(阈值>85%)将2列从字符串类型转换为数值型或日期时间型;采用固定随机种子(42)将数据集按80/20的比例划分为训练集与测试集,并以Snappy压缩格式保存为Parquet文件。 --- ## 数据局限性 - 本数据集数据源自世界银行集团,尚未由非洲电羊(ESA)进行独立验证。 - 自动化清洗流程无法修正原始数据集中的错报值、定义不一致问题或抽样偏差。 - 如需了解发布方提供的方法论说明与注意事项,请参阅[HDX平台原始数据集页面](https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia)。 --- ## 引用格式 bibtex @dataset{hdx_africa_world_bank_science_and_technology_indicators_for_federal_republic_of_somalia, title = {Federal Republic of Somalia - Science and Technology}, author = {World Bank Group}, year = {2025}, url = {https://data.humdata.org/dataset/world-bank-science-and-technology-indicators-for-federal-republic-of-somalia}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } --- *[非洲电羊(Electric Sheep Africa)](https://huggingface.co/electricsheepafrica) — 非洲地区机器学习数据集基础设施,尼日利亚拉各斯。
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务