five

real_estate_ads

收藏
魔搭社区2026-01-02 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/real_estate_ads
下载链接
链接失效反馈
官方服务:
资源简介:
# 🏠 Divar Real Estate Ads Dataset [![Dataset Size](https://img.shields.io/badge/Size-750%20MB-blue)](https://huggingface.co/datasets/divar/real-estate-ads) [![Rows](https://img.shields.io/badge/Rows-1M-green)](https://huggingface.co/datasets/divar/real-estate-ads) ## 📋 Overview The `real_estate_ads` dataset contains one million anonymized real estate advertisements collected from the [Divar](https://divar.ir) platform, one of the largest classified ads platforms in the Middle East. This comprehensive dataset provides researchers, data scientists, and entrepreneurs with authentic real estate market data to build innovative solutions such as price evaluation models, market analysis tools, and forecasting systems. ## 🔍 Dataset Details | Property | Value | | --------------- | ------------------------------------------ | | **Size** | 1,000,000 rows, approximately 750 MB | | **Time Period** | Six-month period (2024) | | **Source** | Anonymized real estate listings from Divar | | **Format** | Tabular data (CSV/Parquet) with 57 columns | | **Languages** | Mixed (primarily Persian) | | **Domains** | Real Estate, Property Market | ## 🚀 Quick Start ```python # Load the dataset using the Hugging Face datasets library from datasets import load_dataset # Load the full dataset dataset = load_dataset("divarofficial/real-estate-ads") # Print the first few examples print(dataset['train'][:5]) # Get dataset statistics print(f"Dataset size: {len(dataset['train'])} rows") print(f"Features: {dataset['train'].features}") ``` ## 📊 Schema The dataset includes comprehensive property information organized in the following categories: ### 🏷️ Categorization - `cat2_slug`, `cat3_slug`: Property categorization slugs - `property_type`: Type of property (apartment, villa, land, etc.) ### 📍 Location - `city_slug`, `neighborhood_slug`: Location identifiers - `location_latitude`, `location_longitude`: Geographic coordinates - `location_radius`: Location accuracy radius ### 📝 Listing Details - `created_at_month`: Timestamp of when the ad was created - `user_type`: Type of user who posted the listing (individual, agency, etc.) - `description`, `title`: Textual information about the property ### 💰 Financial Information - **Rent-related**: `rent_mode`, `rent_value`, `rent_to_single`, `rent_type` - **Price-related**: `price_mode`, `price_value` - **Credit-related**: `credit_mode`, `credit_value` - **Transformed values**: Various transformed financial metrics for analysis ### 🏢 Property Specifications - `land_size`, `building_size`: Property dimensions (in square meters) - `deed_type`, `has_business_deed`: Legal property information - `floor`, `rooms_count`, `total_floors_count`, `unit_per_floor`: Building structure details - `construction_year`, `is_rebuilt`: Age and renovation status ### 🛋️ Amenities and Features - **Utilities**: `has_water`, `has_electricity`, `has_gas` - **Climate control**: `has_heating_system`, `has_cooling_system` - **Facilities**: `has_balcony`, `has_elevator`, `has_warehouse`, `has_parking` - **Luxury features**: `has_pool`, `has_jacuzzi`, `has_sauna` - **Other features**: `has_security_guard`, `has_barbecue`, `building_direction`, `floor_material` ### 🏨 Short-term Rental Information - `regular_person_capacity`, `extra_person_capacity` - `cost_per_extra_person` - **Pricing variations**: `rent_price_on_regular_days`, `rent_price_on_special_days`, `rent_price_at_weekends` ## 📈 Example Analysis ```python import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Convert to pandas DataFrame for analysis df = dataset['train'].to_pandas() # Price distribution by property type plt.figure(figsize=(12, 6)) sns.boxplot(x='property_type', y='price_value', data=df) plt.title('Price Distribution by Property Type') plt.xticks(rotation=45) plt.tight_layout() plt.show() # Correlation between building size and price plt.figure(figsize=(10, 6)) sns.scatterplot(x='building_size', y='price_value', data=df) plt.title('Correlation between Building Size and Price') plt.xlabel('Building Size (sq.m)') plt.ylabel('Price') plt.tight_layout() plt.show() ``` ## 💡 Use Cases This dataset is particularly valuable for: 1. **Price Prediction Models**: Train algorithms to estimate property values based on features ```python # Example: Simple price prediction model from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split features = ['building_size', 'rooms_count', 'construction_year', 'has_parking'] X = df[features].fillna(0) y = df['price_value'].fillna(0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = RandomForestRegressor(n_estimators=100) model.fit(X_train, y_train) ``` 2. **Market Analysis**: Understand trends and patterns in the real estate market 3. **Recommendation Systems**: Build tools to suggest properties based on user preferences 4. **Natural Language Processing**: Analyze property descriptions and titles 5. **Geospatial Analysis**: Study location-based pricing and property distribution ## 🔧 Data Processing Information The data has been: - Anonymized to protect privacy - Randomly sampled from the complete Divar platform dataset - Cleaned with select columns removed to ensure privacy and usability - Standardized to ensure consistency across entries ## 📚 Citation and Usage When using this dataset in your research or applications, please consider acknowledging the source: ```bibtex @dataset{divar2025realestate, author = {Divar Corporation}, title = {Real Estate Ads Dataset from Divar Platform}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/divar/real-estate-ads} } ``` ## 🤝 Contributing We welcome contributions to improve this dataset! If you find issues or have suggestions, please open an issue on the [GitHub repository](https://github.com/divar-ir/kenar-docs) or contact us at [kenar.support@divar.ir](mailto:kenar.support@divar.ir).

# 🏠 Divar 房地产广告数据集 [![Dataset Size](https://img.shields.io/badge/Size-750%20MB-blue)](https://huggingface.co/datasets/divar/real-estate-ads) [![Rows](https://img.shields.io/badge/Rows-1M-green)](https://huggingface.co/datasets/divar/real-estate-ads) ## 📋 概述 `real_estate_ads` 数据集包含从中东地区最大的分类广告平台之一[Divar](https://divar.ir)收集的100万条匿名房地产广告。该全面数据集为研究人员、数据科学家与创业者提供了真实的房地产市场数据,可用于构建房价评估模型、市场分析工具及预测系统等创新解决方案。 ## 🔍 数据集详情 | 属性 | 数值 | | --------------- | ------------------------------------------ | | **规模** | 1,000,000行,约750 MB | | **时间范围** | 2024年六个月周期 | | **来源** | Divar平台的匿名房地产房源 | | **格式** | 含57列的表格数据(CSV/Parquet格式) | | **语言** | 混合语言(以波斯语为主) | | **应用领域** | 房地产、房产市场 | ## 🚀 快速上手 python # 使用Hugging Face datasets库加载数据集 from datasets import load_dataset # 加载完整数据集 dataset = load_dataset("divarofficial/real-estate-ads") # 打印前5条示例数据 print(dataset['train'][:5]) # 获取数据集统计信息 print(f"数据集规模: {len(dataset['train'])} 行") print(f"特征项: {dataset['train'].features}") ## 📊 数据结构 本数据集包含全面的房产信息,按以下类别组织: ### 🏷️ 分类信息 - `cat2_slug`、`cat3_slug`:房产分类标识符(slug) - `property_type`:房产类型(公寓、别墅、土地等) ### 📍 位置信息 - `city_slug`、`neighborhood_slug`:位置标识符 - `location_latitude`、`location_longitude`:地理坐标 - `location_radius`:位置精度半径 ### 📝 广告详情 - `created_at_month`:广告创建时间戳 - `user_type`:发布广告的用户类型(个人、机构等) - `description`、`title`:房产的文本描述与标题 ### 💰 财务信息 - **租金相关**:`rent_mode`、`rent_value`、`rent_to_single`、`rent_type` - **价格相关**:`price_mode`、`price_value` - **信贷相关**:`credit_mode`、`credit_value` - **转换后指标**:各类用于分析的转换后财务指标 ### 🏢 房产规格参数 - `land_size`、`building_size`:房产尺寸(单位:平方米) - `deed_type`、`has_business_deed`:房产法律属性信息 - `floor`、`rooms_count`、`total_floors_count`、`unit_per_floor`:建筑结构细节 - `construction_year`、`is_rebuilt`:房龄与翻新状态 ### 🛋️ 配套设施与特性 - **公用设施**:`has_water`、`has_electricity`、`has_gas` - **温控系统**:`has_heating_system`、`has_cooling_system` - **配套设施**:`has_balcony`、`has_elevator`、`has_warehouse`、`has_parking` - **高端配置**:`has_pool`、`has_jacuzzi`、`has_sauna` - **其他特性**:`has_security_guard`、`has_barbecue`、`building_direction`、`floor_material` ### 🏨 短期租赁信息 - `regular_person_capacity`、`extra_person_capacity` - `cost_per_extra_person` - **定价差异**:`rent_price_on_regular_days`、`rent_price_on_special_days`、`rent_price_at_weekends` ## 📈 示例分析 python import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # 转换为pandas DataFrame以进行分析 df = dataset['train'].to_pandas() # 按房产类型划分的价格分布 plt.figure(figsize=(12, 6)) sns.boxplot(x='property_type', y='price_value', data=df) plt.title("按房产类型划分的价格分布") plt.xticks(rotation=45) plt.tight_layout() plt.show() # 建筑尺寸与价格的相关性 plt.figure(figsize=(10, 6)) sns.scatterplot(x='building_size', y='price_value', data=df) plt.title("建筑尺寸与价格的相关性") plt.xlabel('建筑尺寸(平方米)') plt.ylabel('价格') plt.tight_layout() plt.show() ## 💡 应用场景 本数据集特别适用于以下场景: 1. **房价预测模型**:训练算法基于特征估算房产价值 python # 示例:简易房价预测模型 from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split features = ['building_size', 'rooms_count', 'construction_year', 'has_parking'] X = df[features].fillna(0) y = df['price_value'].fillna(0) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) model = RandomForestRegressor(n_estimators=100) model.fit(X_train, y_train) 2. **市场分析**:洞悉房地产市场的趋势与模式 3. **推荐系统**:构建基于用户偏好的房产推荐工具 4. **自然语言处理**:分析房产描述与标题 5. **地理空间分析**:研究基于位置的房价与房产分布 ## 🔧 数据处理说明 本数据集已完成以下处理: - 匿名化处理以保护用户隐私 - 从Divar平台完整数据集随机抽样生成 - 经清洗并移除部分列以保障隐私与可用性 - 标准化处理以确保各条目间的一致性 ## 📚 引用与使用规范 当在研究或应用中使用该数据集时,请注明来源: bibtex @dataset{divar2025realestate, author = {Divar Corporation}, title = {Real Estate Ads Dataset from Divar Platform}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/divar/real-estate-ads} } ## 🤝 贡献指南 我们欢迎各类贡献以改进本数据集!若发现问题或有改进建议,请在[GitHub仓库](https://github.com/divar-ir/kenar-docs)提交Issue,或发送邮件至[kenar.support@divar.ir](mailto:kenar.support@divar.ir)。
提供机构:
maas
创建时间:
2025-04-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作