real_estate_ads
收藏魔搭社区2026-01-02 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/real_estate_ads
下载链接
链接失效反馈官方服务:
资源简介:
# 🏠 Divar Real Estate Ads Dataset
[](https://huggingface.co/datasets/divar/real-estate-ads)
[](https://huggingface.co/datasets/divar/real-estate-ads)
## 📋 Overview
The `real_estate_ads` dataset contains one million anonymized real estate advertisements collected from the [Divar](https://divar.ir) platform, one of the largest classified ads platforms in the Middle East. This comprehensive dataset provides researchers, data scientists, and entrepreneurs with authentic real estate market data to build innovative solutions such as price evaluation models, market analysis tools, and forecasting systems.
## 🔍 Dataset Details
| Property | Value |
| --------------- | ------------------------------------------ |
| **Size** | 1,000,000 rows, approximately 750 MB |
| **Time Period** | Six-month period (2024) |
| **Source** | Anonymized real estate listings from Divar |
| **Format** | Tabular data (CSV/Parquet) with 57 columns |
| **Languages** | Mixed (primarily Persian) |
| **Domains** | Real Estate, Property Market |
## 🚀 Quick Start
```python
# Load the dataset using the Hugging Face datasets library
from datasets import load_dataset
# Load the full dataset
dataset = load_dataset("divarofficial/real-estate-ads")
# Print the first few examples
print(dataset['train'][:5])
# Get dataset statistics
print(f"Dataset size: {len(dataset['train'])} rows")
print(f"Features: {dataset['train'].features}")
```
## 📊 Schema
The dataset includes comprehensive property information organized in the following categories:
### 🏷️ Categorization
- `cat2_slug`, `cat3_slug`: Property categorization slugs
- `property_type`: Type of property (apartment, villa, land, etc.)
### 📍 Location
- `city_slug`, `neighborhood_slug`: Location identifiers
- `location_latitude`, `location_longitude`: Geographic coordinates
- `location_radius`: Location accuracy radius
### 📝 Listing Details
- `created_at_month`: Timestamp of when the ad was created
- `user_type`: Type of user who posted the listing (individual, agency, etc.)
- `description`, `title`: Textual information about the property
### 💰 Financial Information
- **Rent-related**: `rent_mode`, `rent_value`, `rent_to_single`, `rent_type`
- **Price-related**: `price_mode`, `price_value`
- **Credit-related**: `credit_mode`, `credit_value`
- **Transformed values**: Various transformed financial metrics for analysis
### 🏢 Property Specifications
- `land_size`, `building_size`: Property dimensions (in square meters)
- `deed_type`, `has_business_deed`: Legal property information
- `floor`, `rooms_count`, `total_floors_count`, `unit_per_floor`: Building structure details
- `construction_year`, `is_rebuilt`: Age and renovation status
### 🛋️ Amenities and Features
- **Utilities**: `has_water`, `has_electricity`, `has_gas`
- **Climate control**: `has_heating_system`, `has_cooling_system`
- **Facilities**: `has_balcony`, `has_elevator`, `has_warehouse`, `has_parking`
- **Luxury features**: `has_pool`, `has_jacuzzi`, `has_sauna`
- **Other features**: `has_security_guard`, `has_barbecue`, `building_direction`, `floor_material`
### 🏨 Short-term Rental Information
- `regular_person_capacity`, `extra_person_capacity`
- `cost_per_extra_person`
- **Pricing variations**: `rent_price_on_regular_days`, `rent_price_on_special_days`, `rent_price_at_weekends`
## 📈 Example Analysis
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Convert to pandas DataFrame for analysis
df = dataset['train'].to_pandas()
# Price distribution by property type
plt.figure(figsize=(12, 6))
sns.boxplot(x='property_type', y='price_value', data=df)
plt.title('Price Distribution by Property Type')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# Correlation between building size and price
plt.figure(figsize=(10, 6))
sns.scatterplot(x='building_size', y='price_value', data=df)
plt.title('Correlation between Building Size and Price')
plt.xlabel('Building Size (sq.m)')
plt.ylabel('Price')
plt.tight_layout()
plt.show()
```
## 💡 Use Cases
This dataset is particularly valuable for:
1. **Price Prediction Models**: Train algorithms to estimate property values based on features
```python
# Example: Simple price prediction model
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
features = ['building_size', 'rooms_count', 'construction_year', 'has_parking']
X = df[features].fillna(0)
y = df['price_value'].fillna(0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
```
2. **Market Analysis**: Understand trends and patterns in the real estate market
3. **Recommendation Systems**: Build tools to suggest properties based on user preferences
4. **Natural Language Processing**: Analyze property descriptions and titles
5. **Geospatial Analysis**: Study location-based pricing and property distribution
## 🔧 Data Processing Information
The data has been:
- Anonymized to protect privacy
- Randomly sampled from the complete Divar platform dataset
- Cleaned with select columns removed to ensure privacy and usability
- Standardized to ensure consistency across entries
## 📚 Citation and Usage
When using this dataset in your research or applications, please consider acknowledging the source:
```bibtex
@dataset{divar2025realestate,
author = {Divar Corporation},
title = {Real Estate Ads Dataset from Divar Platform},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/divar/real-estate-ads}
}
```
## 🤝 Contributing
We welcome contributions to improve this dataset! If you find issues or have suggestions, please open an issue on the [GitHub repository](https://github.com/divar-ir/kenar-docs) or contact us at [kenar.support@divar.ir](mailto:kenar.support@divar.ir).
# 🏠 Divar 房地产广告数据集
[](https://huggingface.co/datasets/divar/real-estate-ads)
[](https://huggingface.co/datasets/divar/real-estate-ads)
## 📋 概述
`real_estate_ads` 数据集包含从中东地区最大的分类广告平台之一[Divar](https://divar.ir)收集的100万条匿名房地产广告。该全面数据集为研究人员、数据科学家与创业者提供了真实的房地产市场数据,可用于构建房价评估模型、市场分析工具及预测系统等创新解决方案。
## 🔍 数据集详情
| 属性 | 数值 |
| --------------- | ------------------------------------------ |
| **规模** | 1,000,000行,约750 MB |
| **时间范围** | 2024年六个月周期 |
| **来源** | Divar平台的匿名房地产房源 |
| **格式** | 含57列的表格数据(CSV/Parquet格式) |
| **语言** | 混合语言(以波斯语为主) |
| **应用领域** | 房地产、房产市场 |
## 🚀 快速上手
python
# 使用Hugging Face datasets库加载数据集
from datasets import load_dataset
# 加载完整数据集
dataset = load_dataset("divarofficial/real-estate-ads")
# 打印前5条示例数据
print(dataset['train'][:5])
# 获取数据集统计信息
print(f"数据集规模: {len(dataset['train'])} 行")
print(f"特征项: {dataset['train'].features}")
## 📊 数据结构
本数据集包含全面的房产信息,按以下类别组织:
### 🏷️ 分类信息
- `cat2_slug`、`cat3_slug`:房产分类标识符(slug)
- `property_type`:房产类型(公寓、别墅、土地等)
### 📍 位置信息
- `city_slug`、`neighborhood_slug`:位置标识符
- `location_latitude`、`location_longitude`:地理坐标
- `location_radius`:位置精度半径
### 📝 广告详情
- `created_at_month`:广告创建时间戳
- `user_type`:发布广告的用户类型(个人、机构等)
- `description`、`title`:房产的文本描述与标题
### 💰 财务信息
- **租金相关**:`rent_mode`、`rent_value`、`rent_to_single`、`rent_type`
- **价格相关**:`price_mode`、`price_value`
- **信贷相关**:`credit_mode`、`credit_value`
- **转换后指标**:各类用于分析的转换后财务指标
### 🏢 房产规格参数
- `land_size`、`building_size`:房产尺寸(单位:平方米)
- `deed_type`、`has_business_deed`:房产法律属性信息
- `floor`、`rooms_count`、`total_floors_count`、`unit_per_floor`:建筑结构细节
- `construction_year`、`is_rebuilt`:房龄与翻新状态
### 🛋️ 配套设施与特性
- **公用设施**:`has_water`、`has_electricity`、`has_gas`
- **温控系统**:`has_heating_system`、`has_cooling_system`
- **配套设施**:`has_balcony`、`has_elevator`、`has_warehouse`、`has_parking`
- **高端配置**:`has_pool`、`has_jacuzzi`、`has_sauna`
- **其他特性**:`has_security_guard`、`has_barbecue`、`building_direction`、`floor_material`
### 🏨 短期租赁信息
- `regular_person_capacity`、`extra_person_capacity`
- `cost_per_extra_person`
- **定价差异**:`rent_price_on_regular_days`、`rent_price_on_special_days`、`rent_price_at_weekends`
## 📈 示例分析
python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# 转换为pandas DataFrame以进行分析
df = dataset['train'].to_pandas()
# 按房产类型划分的价格分布
plt.figure(figsize=(12, 6))
sns.boxplot(x='property_type', y='price_value', data=df)
plt.title("按房产类型划分的价格分布")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# 建筑尺寸与价格的相关性
plt.figure(figsize=(10, 6))
sns.scatterplot(x='building_size', y='price_value', data=df)
plt.title("建筑尺寸与价格的相关性")
plt.xlabel('建筑尺寸(平方米)')
plt.ylabel('价格')
plt.tight_layout()
plt.show()
## 💡 应用场景
本数据集特别适用于以下场景:
1. **房价预测模型**:训练算法基于特征估算房产价值
python
# 示例:简易房价预测模型
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
features = ['building_size', 'rooms_count', 'construction_year', 'has_parking']
X = df[features].fillna(0)
y = df['price_value'].fillna(0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
2. **市场分析**:洞悉房地产市场的趋势与模式
3. **推荐系统**:构建基于用户偏好的房产推荐工具
4. **自然语言处理**:分析房产描述与标题
5. **地理空间分析**:研究基于位置的房价与房产分布
## 🔧 数据处理说明
本数据集已完成以下处理:
- 匿名化处理以保护用户隐私
- 从Divar平台完整数据集随机抽样生成
- 经清洗并移除部分列以保障隐私与可用性
- 标准化处理以确保各条目间的一致性
## 📚 引用与使用规范
当在研究或应用中使用该数据集时,请注明来源:
bibtex
@dataset{divar2025realestate,
author = {Divar Corporation},
title = {Real Estate Ads Dataset from Divar Platform},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/divar/real-estate-ads}
}
## 🤝 贡献指南
我们欢迎各类贡献以改进本数据集!若发现问题或有改进建议,请在[GitHub仓库](https://github.com/divar-ir/kenar-docs)提交Issue,或发送邮件至[kenar.support@divar.ir](mailto:kenar.support@divar.ir)。
提供机构:
maas
创建时间:
2025-04-14



