five

Real Estate Price Prediction Data

收藏
DataCite Commons2024-08-08 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Real_Estate_Price_Prediction_Data/26517325
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Overview:</b> This dataset was collected and curated to support research on predicting real estate prices using machine learning algorithms, specifically Support Vector Regression (SVR) and Gradient Boosting Machine (GBM). The dataset includes comprehensive information on residential properties, enabling the development and evaluation of predictive models for accurate and transparent real estate appraisals.<b>Data Source:</b> The data was sourced from Department of Lands and Survey real estate listings.<b>Features:</b> The dataset contains the following key attributes for each property:<b>Area (in square meters):</b> The total living area of the property.<b>Floor Number:</b> The floor on which the property is located.<b>Location:</b> Geographic coordinates or city/region where the property is situated.<b>Type of Apartment:</b> The classification of the property, such as studio, one-bedroom, two-bedroom, etc.<b>Number of Bathrooms:</b> The total number of bathrooms in the property.<b>Number of Bedrooms:</b> The total number of bedrooms in the property.<b>Property Age (in years):</b> The number of years since the property was constructed.<b>Property Condition:</b> A categorical variable indicating the condition of the property (e.g., new, good, fair, needs renovation).<b>Proximity to Amenities:</b> The distance to nearby amenities such as schools, hospitals, shopping centers, and public transportation.<b>Market Price (target variable):</b> The actual sale price or listed price of the property.<b>Data Preprocessing:</b><b>Normalization:</b> Numeric features such as area and proximity to amenities were normalized to ensure consistency and improve model performance.<b>Categorical Encoding:</b> Categorical features like property condition and type of apartment were encoded using one-hot encoding or label encoding, depending on the specific model requirements.<b>Missing Values:</b> Missing data points were handled using appropriate imputation techniques or by excluding records with significant missing information.<b>Usage:</b> This dataset was utilized to train and test machine learning models, aiming to predict the market price of residential properties based on the provided attributes. The models developed using this dataset demonstrated improved accuracy and transparency over traditional appraisal methods.<b>Dataset Availability:</b> The dataset is available for public use under the [CC BY 4.0]. Users are encouraged to cite the related publication when using the data in their research or applications.<b>Citation:</b> If you use this dataset in your research, please cite the following publication:<br>[Real Estate Decision-Making: Precision in Price Prediction through Advanced Machine Learning Algorithms].

【概览】:本数据集经采集并精心整理,旨在支持针对住宅房地产价格预测的机器学习算法研究,尤其适用于支持向量回归(Support Vector Regression,SVR)与梯度提升机(Gradient Boosting Machine,GBM)两类模型。数据集涵盖住宅物业的多维度详尽信息,可用于开发并评估精准且可解释的房地产估价预测模型。 【数据来源】:本数据集源自土地与勘测局(Department of Lands and Survey)发布的房地产挂牌信息。 【特征】:本数据集包含单套住宅物业的以下核心属性: - 建筑面积(平方米):物业的总居住面积 - 楼层:物业所在的楼层 - 区位:物业所处的地理坐标或城市/区域信息 - 公寓类型:物业的户型分类,例如单间公寓、一居室、两居室等 - 卫生间数量:物业内的卫生间总数量 - 卧室数量:物业内的卧室总数量 - 房龄(年):物业竣工至今的时长 - 物业状况:表征物业状态的分类变量(例如全新、良好、一般、需翻新) - 配套设施可达性:物业至周边配套设施(如学校、医院、购物中心及公共交通站点)的距离 - 市场价格(目标变量):物业的实际成交价或挂牌价 【数据预处理】: 1. 归一化处理:针对建筑面积、配套设施可达性等数值型特征进行归一化操作,以保证数据一致性并提升模型训练性能 2. 类别特征编码:根据具体模型需求,对物业状况、公寓类型等类别特征采用独热编码(one-hot encoding)或标签编码(label encoding)进行转换 3. 缺失值处理:采用合理的插补技术处理缺失数据点,或剔除存在大量缺失信息的记录 【应用场景】:本数据集用于训练与测试机器学习模型,旨在通过给定的物业属性预测住宅的市场价格。基于本数据集开发的模型,相较于传统估价方法,具备更高的预测精度与模型可解释性。 【数据集可用性】:本数据集遵循[CC BY 4.0]协议公开可用。鼓励使用者在研究或应用该数据集时引用相关学术文献。 【引用说明】:若在研究中使用本数据集,请引用以下出版物:<br>[Real Estate Decision-Making: Precision in Price Prediction through Advanced Machine Learning Algorithms]
提供机构:
figshare
创建时间:
2024-08-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作