rahulmatthan/india-telecom-data
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/rahulmatthan/india-telecom-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- india
- telecom
- trai
- wireless
- broadband
- mnp
- subscribers
- time-series
pretty_name: India Telecom Subscription Data (TRAI) 2016–2026
size_categories:
- n<1K
---
# India Telecom Subscription Data (TRAI) 2016–2026
Monthly telecom subscription statistics for India, parsed from the official **Telecom Regulatory Authority of India (TRAI)** Telecom Subscription Data (TSD) PDF reports.
Covers the full post-Jio era: from Jio's disruptive launch in September 2016 through to January 2026, capturing the collapse of smaller operators, the Vodafone-Idea merger, and the rise of wireless broadband.
## Files
| File | Rows | Description |
|---|---|---|
| `data/telecom_monthly.parquet` | 119 | National-level metrics, one row per month |
| `data/operators_monthly.parquet` | 1,679 | Wireless subscribers by operator × LSA × month (2025 onwards) |
## Loading the Data
```python
import pandas as pd
monthly = pd.read_parquet("data/telecom_monthly.parquet")
operators = pd.read_parquet("data/operators_monthly.parquet")
```
## `telecom_monthly` — Field Reference
| Field | Unit | Coverage | Description |
|---|---|---|---|
| `data_month` | YYYY-MM | 119/119 | Reporting month |
| `wireless_total_mn` | millions | 119/119 | Total wireless (mobile) subscribers |
| `wireline_total_mn` | millions | 119/119 | Total wireline (fixed-line) subscribers |
| `total_subscribers_mn` | millions | 119/119 | Wireless + wireline combined |
| `broadband_total_mn` | millions | 119/119 | Total broadband subscribers |
| `broadband_wireless_mn` | millions | 112/119 | Wireless broadband subscribers |
| `broadband_wireline_mn` | millions | 112/119 | Wireline broadband subscribers |
| `urban_wireless_mn` | millions | 112/119 | Urban wireless subscribers |
| `rural_wireless_mn` | millions | 112/119 | Rural wireless subscribers |
| `wireless_growth_pct` | % | 77/119 | Monthly wireless subscriber growth rate |
| `overall_tele_density_pct` | % | 67/119 | Overall tele-density (subscribers per 100 population) |
| `m2m_total_mn` | millions | 19/119 | Machine-to-Machine (IoT) connections — available from mid-2024 |
| `mnp_monthly_mn` | millions | 92/119 | Mobile Number Portability requests in the month |
| `validation_score` | 0–1 | 119/119 | Automated data quality score (see Methodology) |
| `validation_status` | pass/warn | 119/119 | Quality status |
## `operators_monthly` — Field Reference
| Field | Unit | Description |
|---|---|---|
| `data_month` | YYYY-MM | Reporting month |
| `lsa` | string | Licensed Service Area (telecom circle) — 22 values |
| `operator` | string | Operator name (Airtel, Jio, Vi, BSNL, MTNL, Reliance Com.) |
| `subscribers` | millions | Wireless subscribers for this operator in this LSA |
| `prev_month` | millions | Previous month's subscriber count (for MoM comparison) |
| `net_add` | millions | Net subscriber addition / loss |
### LSAs (Licensed Service Areas)
Andhra Pradesh, Assam, Bihar, Delhi, Gujarat, Haryana, Himachal Pradesh,
Jammu & Kashmir, Karnataka, Kerala, Kolkata, Madhya Pradesh, Maharashtra,
Mumbai, North East, Odisha, Punjab, Rajasthan, Tamil Nadu,
Uttar Pradesh (E), Uttar Pradesh (W), West Bengal
## Coverage Notes
**What is complete (2016–2026):**
- `wireless_total_mn`, `wireline_total_mn`, `broadband_total_mn`, `total_subscribers_mn` — fully populated for all 119 months
**Structural gaps (TRAI reporting limitations, not parsing errors):**
- `wireless_growth_pct`, `overall_tele_density_pct` — absent from 2016–2017 PDFs (different format era)
- `m2m_total_mn` — TRAI only began publishing M2M data prominently from mid-2024
- `mnp_monthly_mn` — absent for 2019–2020 (section format changed; data not extractable)
- `mnp_zone1_mn`, `mnp_zone2_mn` — not yet implemented
- Operator-level LSA data — TRAI only introduced the detailed wireless subscriber annexure (Annexure-II) from January 2025
**Missing months:**
- 2020-08 and 2021-12: not published by TRAI
## Methodology
Data is extracted from TRAI's official PDF reports using a custom Python pipeline:
1. **`fetch_trai_index.py`** — scrapes TRAI's website for all report URLs
2. **`download_trai_pdfs.py`** — downloads and caches PDFs
3. **`parse_trai_pdf.py`** — extracts tables using `pdfplumber`; falls back to OCR (`pytesseract`) for scanned PDFs
4. **`validate_trai_month.py`** — runs 5 automated checks per month:
- **A (30%)** Internal arithmetic (operator sums, broadband components)
- **B (30%)** Cross-text plausibility (table vs. narrative text in same PDF)
- **C (20%)** Range bounds (known historical ranges by era)
- **D (10%)** Month-over-month continuity
- **E (10%)** Structural completeness (22 LSAs, required annexures present)
5. **`build_trai_dataset.py`** — assembles parquets; applies manual patches for months with unextractable tables
All 119 months have validation score ≥ 0.79 (mean: 0.97).
## Source
**TRAI Telecom Subscription Data reports:**
https://www.trai.gov.in/release-publication/reports/telecom-subscriptions-reports
Data is derived from publicly available government documents. Original reports are the copyright of TRAI / Government of India.
## License
This dataset is released under [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
You are free to share and adapt the data for any purpose, including commercial use, provided you give appropriate credit.
**Suggested citation:**
```
TRAI India Telecom Subscription Dataset (2016–2026).
Compiled from TRAI Telecom Subscription Data PDF reports.
https://huggingface.co/datasets/rahulmatthan/india-telecom-data
CC BY 4.0
```
## Known Issues
- **2025-09**: Airtel and BSNL subscriber counts are null for several LSAs due to a font-encoding issue in the source PDF
- **2025-10**: 96 of 132 expected operator×LSA rows present (text-fallback parsing limitation)
- **2017-04**: `validation_score = 0.79` (lowest in dataset) — broadband component sum mismatch in source PDF
提供机构:
rahulmatthan



