Integration of Internet search data to predict tourism trends using spatial-temporal XGBoost composite model
收藏DataCite Commons2021-04-29 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Integration_of_Internet_search_data_to_predict_tourism_trends_using_spatial-temporal_XGBoost_composite_model/11876226/1
下载链接
链接失效反馈官方服务:
资源简介:
Tourism trend prediction is useful for tourism investment and tourism income estimation. Studies on tourism prediction have mostly relied on linear models and historical visitors; however, relationships between tourism trends and their influencing factors may be nonlinear. This study took internet search data as influencing factors and predicted tourism trends using a spatial-temporal framework based on the extreme gradient boosting (XGBoost) method. To incorporate the spatial characteristics, Baidu index data were divided according to locational attributes, and influencing factors were reconstructed via spatial cluster analysis and principal component analysis. Next, variables derived from dimension reduction were further processed based on the weighted moving average to reduce the lag effect between tourism internet search and actual tourism behavior. By using the above spatial-temporal method, Baidu index data can more accurately reflect changes in tourist source composition and tourist volumes. The spatial-temporal XGBoost composite model then applied to the empirical prediction of Beijing's tourism trends. A comparison of prediction results obtained using different models indicates that the spatial-temporal XGBoost composite model has excellent prediction ability. The findings also suggest that machine learning methods may not perform well if the essential characteristics of data, such as spatial autocorrelation and spatial heterogeneity, are ignored.
旅游趋势预测对于旅游投资与旅游收入估算具有重要应用价值。过往旅游预测研究多依赖线性模型与历史到访游客数据,但旅游趋势与其影响因素间的关联往往呈现非线性特征。本研究以网络搜索数据作为影响因子,基于极端梯度提升(Extreme Gradient Boosting,XGBoost)方法构建时空框架开展旅游趋势预测。为融入空间特征,研究按区位属性对百度指数数据进行分区,并通过空间聚类分析与主成分分析重构影响因子。随后,基于加权移动平均对降维得到的变量做进一步处理,以削弱旅游网络搜索与实际旅游行为间的滞后效应。借助上述时空方法,百度指数数据可更精准地反映旅游客源构成与游客规模的变化。随后将该时空XGBoost复合模型应用于北京市旅游趋势的实证预测。对比不同模型的预测结果可知,时空XGBoost复合模型具备优异的预测性能。研究结果同时表明,若忽略数据的核心特征,如空间自相关与空间异质性,机器学习方法的表现或不尽如人意。
提供机构:
figshare
创建时间:
2021-04-29



