five

emperor-mew/ooni-censorship-historical

收藏
Hugging Face2026-02-05 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/emperor-mew/ooni-censorship-historical
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - time-series-forecasting language: - en tags: - censorship - internet-freedom - ooni - human-rights - network-measurement - geopolitics pretty_name: Voidly Global Censorship Index size_categories: - 1M<n<10M --- # Voidly Global Censorship Index **The most comprehensive open dataset for internet censorship research and ML.** ## Dataset Description This dataset contains 10 years of global internet censorship measurements from 120+ countries, including: - **1.6M+ daily measurements** (2017-2026) - **37K detected anomaly spikes** - **4.5K confirmed censorship events** with labels - **25+ known major incidents** (Mahsa Amini protests, Myanmar coup, etc.) ### Data Sources - Primary: [OONI (Open Observatory of Network Interference)](https://ooni.org) - Secondary: Voidly Research analysis and labeling ## Files | File | Description | Rows | |------|-------------|------| | `ooni-historical.parquet` | Daily measurements by country/test | 1.6M | | `censorship-incidents.parquet` | Labeled anomaly spikes | 37K | | `known-events.json` | Major censorship events | 25+ | ## Usage ```python from datasets import load_dataset # Load historical measurements ds = load_dataset("emperor-mew/global-censorship-index", data_files="ooni-historical.parquet") # Load labeled incidents (for ML training) incidents = load_dataset("emperor-mew/global-censorship-index", data_files="censorship-incidents.parquet") ``` ## Schema ### ooni-historical | Column | Type | Description | |--------|------|-------------| | country | string | ISO 3166-1 alpha-2 country code | | test_name | string | OONI test type (web_connectivity, telegram, whatsapp) | | date | date | Measurement date | | measurement_count | int | Total measurements | | anomaly_count | int | Measurements showing anomalies | | confirmed_count | int | Confirmed blocked | | anomaly_rate | float | Fraction showing anomalies (0-1) | ### censorship-incidents | Column | Type | Description | |--------|------|-------------| | country | string | ISO 3166-1 alpha-2 country code | | date | date | Incident date | | anomaly_rate | float | Measured anomaly rate | | measurement_count | int | Sample size | | spike_magnitude | float | Z-score above baseline | | label | int | 1=confirmed censorship, 0=not | | event | string | Matched known event (if any) | | confidence | float | Label confidence (0-1) | ## Known Events Covered - 🇮🇷 Iran Mahsa Amini protests (2022) - 🇲🇲 Myanmar military coup (2021) - 🇧🇾 Belarus election shutdown (2020) - 🇷🇺 Russia Ukraine invasion blocks (2022+) - 🇰🇿 Kazakhstan January protests (2022) - 🇸🇩 Sudan military coup (2021) - 🇨🇺 Cuba July protests (2021) - 🇺🇬 Uganda election shutdown (2021) - And 17+ more... ## Model We provide a trained GradientBoosting classifier: - **F1 Score**: 99.8% - **ROC AUC**: 1.000 - Available via API: `https://api.voidly.ai/hydra/v1/detect` ## Citation ```bibtex @dataset{voidly_censorship_index_2026, author = {Voidly Research}, title = {Global Censorship Index: 10 Years of Internet Measurement Data}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/emperor-mew/global-censorship-index} } ``` ## Links - 🌐 [Voidly Censorship Index](https://voidly.ai/censorship-index) - 📡 [Real-time API](https://api.voidly.ai/data/censorship-index.json) - 🤖 [MCP Server](https://www.npmjs.com/package/@voidly/mcp-server) - 📊 [OONI (source)](https://ooni.org) ## License CC BY 4.0 - Attribution required
提供机构:
emperor-mew
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作