ahnafch01/Bangladesh_Locations
收藏Hugging Face2025-12-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ahnafch01/Bangladesh_Locations
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
- bn
tags:
- location
- Postal
- Bangladesh
- Division
- District
- Upazilla
size_categories:
- 1K<n<10K
---
# Bangladesh Postcodes Dataset (Bilingual & Structured)
A comprehensive, cleaned, and bilingual (English & Bangla) database of postal codes in Bangladesh. This dataset covers the full administrative hierarchy: **Division > District > Thana (Upazila) > Post Office**.
## 📂 Files Included
| Filename | Format | Description |
| :--- | :--- | :--- |
| `bangladesh_postcodes_final.csv` | **CSV** | The master dataset with all columns. Best for data analysis or database imports. |
| `bangladesh_postcodes_flat.json` | **JSON** | A flat array of objects. Best for Frontend applications (React/Vue). |
| `address_lookup_db.json` | **JSON** | **Optimized Hash Map.** Keys are keywords (PostCode, English Name, Bangla Name) pointing to location data. Best for backend search logic. |
## 📊 Data Schema
The dataset contains the following columns:
| Column Name | Data Type | Description | Example |
| :--- | :--- | :--- | :--- |
| `Division_ID` | Integer | Administrative ID for the Division. | `1` |
| `District_ID` | Integer | Administrative ID for the District. | `1` |
| `District_Name` | String | Name of the District (English). | `Dhaka` |
| `District_Name_BN` | String | Name of the District (Bangla). | `ঢাকা` |
| `Thana_ID` | Integer | Administrative ID for the Police Station/Upazila. | `16` |
| `Thana_Name` | String | Name of the Upazila/Thana (English). | `Banani` |
| `Thana_Name_BN` | String | Name of the Upazila/Thana (Bangla). | `বনানী` |
| `PostOffice_Name` | String | Name of the specific Post Office (English). | `Banani` |
| `PostOffice_Name_BN` | String | Name of the specific Post Office (Bangla). | `বনানী` |
| `Office_Type` | String | Administrative classification (see below). | `TSO` |
| `PostCode` | Integer | The 4-digit postal code. | `1213` |
| `Slug` | String | URL-friendly identifier. | `banani-tso-1213` |
### 🏢 Office Types Meaning
The `Office_Type` column was extracted to clean the names.
* **GPO:** General Post Office (Head of Region)
* **HO:** Head Office (Head of District)
* **SO:** Sub Office (Upazila Level)
* **TSO:** Town Sub Office (City Area)
* **EDBO/EDSO:** Extra Departmental Branch Office (Rural/Village Agent-run)
## 🛠 Methodology
1. **Scraping:** Data was gathered hierarchically from publicly available postal data sources.
2. **Cleaning:**
* Removed suffixes (e.g., "Banani TSO" > "Banani").
* Extracted Administrative Types into a separate column.
3. **Translation (AI):**
* English to Bangla translation was performed using **OpenAI GPT-4o-mini**.
* Context-aware translation applied (e.g., "Hat" > "হাট", not "হ্যাট").
* Acronyms like "EPZ", "BAF" were transliterated phonetically.
4. **Validation:**
* Regex checks were run to ensure no English characters remained in Bangla columns.
* Logic checks to ensure Thanas map correctly to Districts.
## ⚠️ Known Limitations
* **AI Translations:** While 99% accurate, minor phonetic errors may exist in the Bangla spellings.
* **Dynamic Data:** Postal codes change occasionally. This dataset represents a snapshot as of **December 2025**.
## 📄 License
This dataset is provided "as is" for educational and development purposes. The underlying factual data (Post Codes) belongs to the public domain/Bangladesh Post Office.
**Generated by:** Asif Ahnaf Chowdhury
**Tools Used:** Python, Pandas, OpenAI API.
提供机构:
ahnafch01



