emperor-mew/ooni-censorship-historical
收藏Hugging Face2026-02-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/emperor-mew/ooni-censorship-historical
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
- time-series-forecasting
language:
- en
tags:
- censorship
- internet-freedom
- ooni
- human-rights
- network-measurement
- geopolitics
pretty_name: Voidly Global Censorship Index
size_categories:
- 1M<n<10M
---
# Voidly Global Censorship Index
**The most comprehensive open dataset for internet censorship research and ML.**
## Dataset Description
This dataset contains 10 years of global internet censorship measurements from 120+ countries, including:
- **1.6M+ daily measurements** (2017-2026)
- **37K detected anomaly spikes**
- **4.5K confirmed censorship events** with labels
- **25+ known major incidents** (Mahsa Amini protests, Myanmar coup, etc.)
### Data Sources
- Primary: [OONI (Open Observatory of Network Interference)](https://ooni.org)
- Secondary: Voidly Research analysis and labeling
## Files
| File | Description | Rows |
|------|-------------|------|
| `ooni-historical.parquet` | Daily measurements by country/test | 1.6M |
| `censorship-incidents.parquet` | Labeled anomaly spikes | 37K |
| `known-events.json` | Major censorship events | 25+ |
## Usage
```python
from datasets import load_dataset
# Load historical measurements
ds = load_dataset("emperor-mew/global-censorship-index", data_files="ooni-historical.parquet")
# Load labeled incidents (for ML training)
incidents = load_dataset("emperor-mew/global-censorship-index", data_files="censorship-incidents.parquet")
```
## Schema
### ooni-historical
| Column | Type | Description |
|--------|------|-------------|
| country | string | ISO 3166-1 alpha-2 country code |
| test_name | string | OONI test type (web_connectivity, telegram, whatsapp) |
| date | date | Measurement date |
| measurement_count | int | Total measurements |
| anomaly_count | int | Measurements showing anomalies |
| confirmed_count | int | Confirmed blocked |
| anomaly_rate | float | Fraction showing anomalies (0-1) |
### censorship-incidents
| Column | Type | Description |
|--------|------|-------------|
| country | string | ISO 3166-1 alpha-2 country code |
| date | date | Incident date |
| anomaly_rate | float | Measured anomaly rate |
| measurement_count | int | Sample size |
| spike_magnitude | float | Z-score above baseline |
| label | int | 1=confirmed censorship, 0=not |
| event | string | Matched known event (if any) |
| confidence | float | Label confidence (0-1) |
## Known Events Covered
- 🇮🇷 Iran Mahsa Amini protests (2022)
- 🇲🇲 Myanmar military coup (2021)
- 🇧🇾 Belarus election shutdown (2020)
- 🇷🇺 Russia Ukraine invasion blocks (2022+)
- 🇰🇿 Kazakhstan January protests (2022)
- 🇸🇩 Sudan military coup (2021)
- 🇨🇺 Cuba July protests (2021)
- 🇺🇬 Uganda election shutdown (2021)
- And 17+ more...
## Model
We provide a trained GradientBoosting classifier:
- **F1 Score**: 99.8%
- **ROC AUC**: 1.000
- Available via API: `https://api.voidly.ai/hydra/v1/detect`
## Citation
```bibtex
@dataset{voidly_censorship_index_2026,
author = {Voidly Research},
title = {Global Censorship Index: 10 Years of Internet Measurement Data},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/emperor-mew/global-censorship-index}
}
```
## Links
- 🌐 [Voidly Censorship Index](https://voidly.ai/censorship-index)
- 📡 [Real-time API](https://api.voidly.ai/data/censorship-index.json)
- 🤖 [MCP Server](https://www.npmjs.com/package/@voidly/mcp-server)
- 📊 [OONI (source)](https://ooni.org)
## License
CC BY 4.0 - Attribution required
提供机构:
emperor-mew



