five

electricsheepafrica/africa-coronavirus-in-sub-saharan-africa

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/africa-coronavirus-in-sub-saharan-africa
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - en license: cc-by-4.0 multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - tabular-classification task_ids: [] tags: - africa - humanitarian - hdx - electric-sheep-africa - covid-19 - epidemics-outbreaks - health - ken - nga - zaf pretty_name: "Community knowledge and perceptions of Coronavirus COVID-19 in Sub-Saharan Africa" dataset_info: splits: - name: train num_examples: 1073 - name: test num_examples: 268 --- # Community knowledge and perceptions of Coronavirus COVID-19 in Sub-Saharan Africa **Publisher:** Mobile Accord, Inc. (GeoPoll) · **Source:** [HDX](https://data.humdata.org/dataset/coronavirus-in-sub-saharan-africa) · **License:** `cc-by` · **Updated:** 2025-04-09 --- ## Abstract This data is from GeoPoll's study on knowledge and perceptions of the recent coronavirus outbreak in South Africa, Kenya, and Nigeria. Each row in this dataset represents country-level aggregates. Data was last updated on HDX on 2025-04-09. Geographic scope: **KEN, NGA, ZAF**. *Curated into ML-ready Parquet format by [Electric Sheep Africa](https://huggingface.co/electricsheepafrica).* --- ## Dataset Characteristics | | | |---|---| | **Domain** | Food security and nutrition | | **Unit of observation** | Country-level aggregates | | **Rows (total)** | 1,342 | | **Columns** | 51 (8 numeric, 43 categorical, 0 datetime) | | **Train split** | 1,073 rows | | **Test split** | 268 rows | | **Geographic scope** | KEN, NGA, ZAF | | **Publisher** | Mobile Accord, Inc. (GeoPoll) | | **HDX last updated** | 2025-04-09 | --- ## Variables **Geographic** — `country` (Kenya, Nigeria, South Africa), `symptoms` (Yes, No, Not sure), `symptoms2_fever` (Yes, No), `symptoms2_bleeding` (No, Yes), `symptoms2_vomiting` (No, Yes) and 10 others. **Demographic** — `agegroup` (25-34, 15-24, 35+), `gender` (Male, Female), `modetransmission_being_near_infected_person`, `modetransmission_touching_an_infected_person`, `informationsources_government_messages`. **Outcome / Measurement** — `cases` (Yes - There are confirmed cases, No - There are no confirmed cases, Not sure). **Identifier / Metadata** — `preventativemeasures_avoid_public_transport`, `govtconfidence` (range 1.0–5.0), `globalconfidence` (range 1.0–5.0), `informationsources_newspapers`, `informationsources_tv` and 5 others. **Other** — `adm_1` (Gauteng, KwaZulu-Natal, Nairobi), `awareness` (Yes, No), `levelconcern` (range 1.0–5.0), `broadcast2` (range 1.0–1.0), `modetransmission_air` and 15 others. --- ## Quick Start ```python from datasets import load_dataset ds = load_dataset("electricsheepafrica/africa-coronavirus-in-sub-saharan-africa") train = ds["train"].to_pandas() test = ds["test"].to_pandas() print(train.shape) train.head() ``` --- ## Schema | Column | Type | Null % | Range / Sample Values | |---|---|---|---| | `country` | object | 0.0% | Kenya, Nigeria, South Africa | | `agegroup` | object | 0.0% | 25-34, 15-24, 35+ | | `gender` | object | 0.0% | Male, Female | | `adm_1` | object | 0.0% | Gauteng, KwaZulu-Natal, Nairobi | | `awareness` | object | 0.0% | Yes, No | | `cases` | object | 5.0% | Yes - There are confirmed cases, No - There are no confirmed cases, Not sure | | `levelconcern` | float64 | 5.0% | 1.0 – 5.0 (mean 4.2957) | | `symptoms` | object | 5.0% | Yes, No, Not sure | | `symptoms2_fever` | object | 29.0% | Yes, No | | `symptoms2_bleeding` | object | 29.0% | No, Yes | | `symptoms2_vomiting` | object | 29.0% | No, Yes | | `symptoms2_cough` | object | 29.0% | | | `symptoms2_shortness_of_breath` | object | 29.0% | | | `symptoms2_diarrhoea` | object | 29.0% | | | `symptoms2_none_of_the_above` | object | 29.0% | | | `broadcast2` | float64 | 5.0% | 1.0 – 1.0 (mean 1.0) | | `modetransmission_being_near_infected_person` | object | 5.0% | | | `modetransmission_touching_an_infected_person` | object | 5.0% | | | `modetransmission_air` | object | 5.0% | | | `modetransmission_drinking_water` | object | 5.0% | | | `modetransmission_surfaces` | object | 5.0% | | | `modetransmission_other` | object | 5.0% | | | `modetransmission_don_t_know` | object | 5.0% | | | `riskawareness` | object | 5.0% | | | `placegreatestrisk` | object | 33.7% | | | `virusprevention` | object | 5.0% | | | `preventativemeasures_avoid_public_places` | object | 31.0% | | | `preventativemeasures_avoid_public_transport` | object | 31.0% | | | `preventativemeasures_avoid_physical_contact` | object | 31.0% | | | `preventativemeasures_increase_hygiene` | object | 31.0% | | | `broadcast3` | float64 | 5.0% | 1.0 – 1.0 (mean 1.0) | | `govtconfidence` | float64 | 5.0% | 1.0 – 5.0 (mean 2.9898) | | `globalconfidence` | float64 | 5.0% | 1.0 – 5.0 (mean 3.342) | | `financial` | object | 5.0% | | | `foodavailability` | object | 5.0% | | | `concerns` | object | 5.0% | | | `informationsources_newspapers` | object | 5.0% | | | `informationsources_tv` | object | 5.0% | | | `informationsources_radio` | object | 5.0% | | | `informationsources_social_media` | object | 5.0% | | | `informationsources_friends_family` | object | 5.0% | | | `informationsources_government_messages` | object | 5.0% | | | `whatsapp` | object | 5.0% | | | `whatsappinformation` | object | 29.1% | | | `mediacommunication` | float64 | 5.0% | 1.0 – 5.0 (mean 4.011) | | `globalcommunication` | float64 | 5.0% | 1.0 – 5.0 (mean 4.2016) | | `govtcommunication` | float64 | 5.0% | 1.0 – 5.0 (mean 3.8808) | | `outsidehelp` | object | 5.0% | | | `chineseworkers` | object | 5.0% | | | `esa_source` | object | 0.0% | | | `esa_processed` | object | 0.0% | | --- ## Numeric Summary | Column | Min | Max | Mean | Median | |---|---|---|---|---| | `levelconcern` | 1.0 | 5.0 | 4.2957 | 5.0 | | `broadcast2` | 1.0 | 1.0 | 1.0 | 1.0 | | `broadcast3` | 1.0 | 1.0 | 1.0 | 1.0 | | `govtconfidence` | 1.0 | 5.0 | 2.9898 | 3.0 | | `globalconfidence` | 1.0 | 5.0 | 3.342 | 3.0 | | `mediacommunication` | 1.0 | 5.0 | 4.011 | 5.0 | | `globalcommunication` | 1.0 | 5.0 | 4.2016 | 5.0 | | `govtcommunication` | 1.0 | 5.0 | 3.8808 | 5.0 | --- ## Curation Raw data was downloaded from HDX via the CKAN API and converted to Parquet. Column names were lowercased and standardised to snake_case. Common missing-value markers (`N/A`, `null`, `none`, `-`, `unknown`, `no data`, `#N/A`) were unified to `NaN`. 8 exact duplicate rows were removed. The dataset was split 80/20 into train and test partitions using a fixed random seed (42) and saved as Snappy-compressed Parquet. --- ## Limitations - Data originates from Mobile Accord, Inc. (GeoPoll) and has not been independently validated by ESA. - Automated cleaning cannot correct for misreported values, definitional inconsistencies, or sampling bias in the original collection. - The following columns have >20% missing values and should be treated with caution in modelling: `symptoms2_fever`, `symptoms2_bleeding`, `symptoms2_vomiting`, `symptoms2_cough`, `symptoms2_shortness_of_breath`, `symptoms2_diarrhoea`, `symptoms2_none_of_the_above`, `placegreatestrisk`.... - This dataset spans 3 countries; geographic and methodological inconsistencies across national boundaries may affect cross-country comparability. - Refer to the [original HDX dataset page](https://data.humdata.org/dataset/coronavirus-in-sub-saharan-africa) for the publisher's own methodology notes and caveats. --- ## Citation ```bibtex @dataset{hdx_africa_coronavirus_in_sub_saharan_africa, title = {Community knowledge and perceptions of Coronavirus COVID-19 in Sub-Saharan Africa}, author = {Mobile Accord, Inc. (GeoPoll)}, year = {2025}, url = {https://data.humdata.org/dataset/coronavirus-in-sub-saharan-africa}, note = {Repackaged for machine learning by Electric Sheep Africa (https://huggingface.co/electricsheepafrica)} } ``` --- *[Electric Sheep Africa](https://huggingface.co/electricsheepafrica) — Africa's ML dataset infrastructure. Lagos, Nigeria.*
提供机构:
electricsheepafrica
二维码
社区交流群
二维码
科研交流群
商业服务