five

ValerianFourel/seoul-medical-facilities

收藏
Hugging Face2026-01-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ValerianFourel/seoul-medical-facilities
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - other language: - ko tags: - medical - healthcare - seoul - south-korea - facilities - geospatial pretty_name: Seoul Medical Facilities Dataset size_categories: - 10K<n<100K --- # Seoul Medical Facilities Dataset ## Dataset Description This dataset contains comprehensive information about **unique** medical facilities (hospitals, clinics) across all administrative districts (구) and neighborhoods (동) in Seoul, South Korea. **Note**: This dataset contains only unique facilities. Duplicates have been removed based on `place_id`, with the most complete record retained for each facility. ### Dataset Summary - **Unique Facilities**: 8,484 - **Districts Covered**: 25 - **Neighborhoods (Dong) Covered**: 320 - **Collection Period**: 2025-12-28 to 2026-01-03 - **Source**: Naver Maps - **Language**: Korean - **Deduplication**: Yes (by place_id) ### Data Collection Data was collected by systematically scraping Naver Maps for medical facilities across Seoul's administrative divisions: - **Keywords**: 병원 (hospital), 의원 (clinic), 클리닉 (clinic) - **Coverage**: All 25 districts (구) and 424+ neighborhoods (동) - **Method**: Automated web scraping with Selenium - **Deduplication**: Facilities appearing in multiple keyword searches are deduplicated by `place_id` ### Facility Types The dataset includes three types of medical facilities: 1. **병원 (Byeongwon)** - Hospitals 2. **의원 (Uiwon)** - Clinics/Medical offices 3. **클리닉 (Keullinik)** - Specialized clinics **Note**: Each facility appears only once, even if it matched multiple search keywords. ## Dataset Structure ### Data Fields | Field | Type | Description | |-------|------|-------------| | `name` | string | Facility name | | `category` | string | Facility category/type | | `address` | string | Street address | | `phone` | string | Contact phone number | | `place_id` | string | **Unique** Naver Maps place identifier | | `url` | string | Naver Maps URL | | `reviews` | string | Review ratings and counts | | `hours_status` | string | Current operating status | | `business_hours` | string | Detailed business hours | | `amenities` | string | Available amenities/facilities | | `website` | string | Official website (if available) | | `file_district` | string | Seoul district (구) | | `file_dong` | string | Neighborhood (동) | | `file_keyword` | string | Search keyword used (from original search) | | `scraped_at` | string | Timestamp of data collection | **Important**: `place_id` is the unique identifier. Each `place_id` appears exactly once in the dataset. ### Geographic Coverage Seoul's 25 districts (구): - Gangnam-gu, Gangdong-gu, Gangbuk-gu, Gangseo-gu, Gwanak-gu, Gwangjin-gu, Guro-gu, Geumcheon-gu, Nowon-gu, Dobong-gu, Dongdaemun-gu, Dongjak-gu, Mapo-gu, Seodaemun-gu, Seocho-gu, Seongdong-gu, Seongbuk-gu, Songpa-gu, Yangcheon-gu, Yeongdeungpo-gu, Yongsan-gu, Eunpyeong-gu, Jongno-gu, Jung-gu, Jungnang-gu ## Usage ### Load with Pandas ```python import pandas as pd # Load parquet file df = pd.read_parquet("hf://datasets/ValerianFourel/seoul-medical-facilities/seoul_medical_facilities.parquet") # Basic exploration print(f"Total unique facilities: {len(df):,}") print(f"Districts: {df['file_district'].nunique()}") # Each place_id is unique assert df['place_id'].is_unique # Filter by district gangnam = df[df['file_district'] == 'Gangnam-gu'] print(f"Gangnam facilities: {len(gangnam):,}") # Filter by facility type hospitals = df[df['file_keyword'] == '병원'] print(f"Hospitals: {len(hospitals):,}") ``` ### Load with Datasets ```python from datasets import load_dataset dataset = load_dataset("ValerianFourel/seoul-medical-facilities") df = dataset['train'].to_pandas() # Verify uniqueness print(f"Unique facilities: {len(df):,}") print(f"Unique place_ids: {df['place_id'].nunique():,}") assert len(df) == df['place_id'].nunique() ``` ## Use Cases - **Healthcare Access Analysis**: Study distribution of medical facilities across Seoul - **Geographic Analysis**: Map healthcare infrastructure by district/neighborhood - **Urban Planning**: Identify underserved areas - **Public Health Research**: Analyze healthcare availability patterns - **Business Intelligence**: Market analysis for medical services - **Navigation/Directory Apps**: Build medical facility finders ## Data Quality Notes - **Deduplication**: Each facility appears exactly once based on `place_id` - **Completeness**: For duplicate entries, the record with most complete information was retained - **Unique Identifier**: Use `place_id` to reference specific facilities - Phone numbers and websites may not be available for all facilities - Business hours may change; check official sources for current information - Review data is a snapshot at collection time ## Limitations - Data represents a snapshot at collection time - Some fields may be incomplete (N/A values) - Limited to facilities discoverable via Naver Maps - Does not include detailed medical specialties or services - Operating hours and contact information may change ### DATASET CONTEXT: Seoul Medical Facilities Knowledge Base Regarding, facilities_metareviews_rag_ready.parquet: You have access to a structured database of medical facilities in Seoul, South Korea. Each record contains three types of data: 1. **Fact-Based Metadata:** Name, address, hours, and parsed medical info. 2. **AI-Derived Accessibility Scores:** Confidence scores (1-7) indicating if English services are available. 3. **Patient Sentiment Summaries:** AI-generated summaries of thousands of patient reviews, grouped by topic. ### COLUMN DEFINITIONS - **Identity & Location** - `name`: The official name of the facility. - `category`: Medical specialty (e.g., Dermatology, Dentistry). - `address`: Physical address in Seoul. - `medical_info_parsed`: (JSON/Dict) Specific procedures, equipment, or departments listed on their official profile. - **Language Accessibility (AI-Scored)** - `english_confidence_score` (1-7): The average confidence that staff speaks English, based on analyzing English/Mixed reviews. - *1 = Definitely No, 4 = Ambiguous, 7 = Fluent/Verified.* - `english_max_score` (1-7): The highest single piece of evidence found. High max + low average means "some staff might speak English, but it's not guaranteed." - `has_english` / `has_mixed`: (Boolean) True if actual reviews exist in these languages. - **Patient Experience (RAG Data)** - `Summaries` (List[str]): 3-10 sentence narrative summaries of patient feedback in English. Covers pros/cons, wait times, and kindness. - `Key_Highlights` (List[Dict]): Top semantic topics extracted from reviews (e.g., `{'topic_en': 'Hidden Fees', 'relevance': 0.85}`). - `Summaries_Korean` (List[str]): The same narrative summaries in Korean. ## Citation If you use this dataset, please cite: ```bibtex @dataset{seoul_medical_facilities_2024, author = {Fourel, Valerian}, title = {Seoul Medical Facilities Dataset}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/ValerianFourel/seoul-medical-facilities} } ``` ## License This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. ## Maintenance - **Maintainer**: ValerianFourel - **Last Updated**: 2026-01-03 ## Acknowledgments Data sourced from Naver Maps. This dataset is intended for research and educational purposes.
提供机构:
ValerianFourel
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作