five

Korean_Real_Estate_Ads_Dataset

收藏
魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/Kratos-AI/Korean_Real_Estate_Ads_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
# Korean Real Estate Ads Dataset *This dataset contains high-resolution images of Korean real estate advertisements, including online listings, printed flyers, and billboard ads for properties such as apartments, houses, and commercial spaces. It is designed to support AI research in OCR, visual understanding, and property analysis.* ## Contact For queries or collaborations related to this dataset, contact: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## Supported Tasks - **Task Categories**: - Image Classification - Text Recognition (OCR) - Scene Understanding - **Supported Tasks**: - Extraction of property information from ads (price, size, location, contact) - Classification of property types (apartment, house, commercial) - Multilingual OCR for bilingual ads (Korean-English) - Analysis of advertisement layout and design - AI research in property recommendation and market analytics ## Languages - **Primary Language**: Korean - **Secondary Presence**: English (on bilingual or international property ads) ## Dataset Creation ### Curation Rationale This dataset was created to support AI systems that can automatically read and analyze real estate advertisements in Korea. It enables research in OCR, document understanding, and property-focused computer vision models. ### Source Data - **Contributors**: Crowdsourced online listings, field-collected flyers, and photographed billboard ads - **Collection Process**: Images were captured from public sources. Personal contact details were anonymized or removed, and no private individuals were included in the images. ### Other Known Limitations - **Bias**: Urban properties (Seoul, Busan, Incheon) are more represented than rural or suburban listings - **Visual Variability**: Differences in ad quality, lighting, and photo angles may affect text extraction - **Content Scope**: Focused on property description and pricing; neighborhood or interior details may be limited ## Intended Uses ### ✅ Direct Use - Training OCR and text-extraction models for real estate ads - Property classification and market analytics research - Visual-language research in advertisement layouts - AI applications in property recommendation and real estate search ### ❌ Out-of-Scope Use - Tracking or contacting individual property owners - Commercial reuse of ad content without consent or licensing - Using dataset for surveillance or privacy-invading purposes ## License CC BY 4.0

# 韩国房地产广告数据集 *本数据集包含韩国房地产广告的高分辨率图像,涵盖公寓、独栋住宅及商业用房等物业类型的在线房源列表、印刷传单与户外广告牌广告,旨在为光学字符识别(Optical Character Recognition, OCR)、视觉理解及物业分析领域的人工智能研究提供支撑。 ## 联系方式 若有关于本数据集的咨询或合作需求,请联系: - anoushka@kgen.io - abhishek.vadapalli@kgen.io ## 支持任务 - **任务类别**: - 图像分类 - 文本识别(光学字符识别) - 场景理解 - **支持的具体任务**: - 从广告中提取物业信息(价格、面积、位置、联系方式) - 物业类型分类(公寓、独栋住宅、商业用房) - 双语广告(韩英双语)的多语言光学字符识别 - 广告版式与设计分析 - 物业推荐与市场分析领域的人工智能研究 ## 语言情况 - **主要语言**:韩语 - **辅助语言**:英语(见于双语或国际物业广告中) ## 数据集构建 ### 构建初衷 本数据集旨在研发可自动读取并分析韩国境内房地产广告的人工智能系统,可为光学字符识别、文档理解及面向物业的计算机视觉模型相关研究提供支撑。 ### 源数据 - **数据来源**:众包在线房源列表、实地采集的传单及拍摄的广告牌广告 - **采集流程**:图像均来自公开渠道,已对个人联系方式进行匿名化或移除处理,且图像中未包含任何私人个体。 ### 已知局限性 - **数据偏差**:相较于农村或郊区房源,首尔、釜山、仁川等城市物业的占比更高 - **视觉差异性**:广告质量、光照及拍摄角度的差异可能影响文本提取效果 - **内容范围**:数据集仅聚焦物业描述与价格信息,周边环境或室内细节的相关内容较为有限 ## 预期用途 ### ✅ 直接用途 - 训练面向房地产广告的光学字符识别与文本提取模型 - 物业分类与市场分析研究 - 广告版式领域的视觉-语言研究 - 物业推荐与房地产搜索领域的人工智能应用 ### ❌ 超出适用范围的用途 - 追踪或联系个体物业所有者 - 未经许可或授权的广告内容商业复用 - 将数据集用于监控或侵犯隐私的用途 ## 许可协议 CC BY 4.0
提供机构:
maas
创建时间:
2025-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作