Adaptive Reinforcement Learning-Driven Dynamic Pricing and Consumer Behavior Game Model in E-Commerce

中国科学数据2026-04-23 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.3724/j.issn.1004-3918.2026.02.021

下载链接

链接失效反馈

官方服务：

资源简介：

To address the challenges in e-commerce dynamic pricing arising from highly non-stationary demand and competitive environments, the difficulty of explicitly modeling heterogeneous consumer preferences, and the coexistence of profitability, stability, and compliance constraints, this paper proposes an adaptive reinforcement learning game-theoretic framework, ARL-GT. The problem setting considered in this study emphasizes not only revenue maximization, but also the practical need to maintain stable pricing behavior and rule-compliant operation under continuously changing market conditions. The framework employs a Stackelberg bi-level structure to characterize the optimal-response relationship between platform pricing and consumer response, so that the sequential interaction of platform decisions and consumer behavioral feedback can be represented in a structured, explicit, and interpretable way. Within this structure, the pricing side and the response side are linked through an optimal-reaction formulation, which is used to describe the decision process of “platform pricing-consumer response” in a unified game-theoretic form. The method further adopts maximum-entropy inverse reinforcement learning to learn a consumer utility function from historical interaction logs, and then uses the learned utility to generate interpretable expected demand. This demand representation is introduced to stabilize reward signals during training and to alleviate the sparse-feedback problem that commonly affects policy optimization in dynamic pricing tasks. At the policy layer, MAML is combined with PPO to learn a transferable initialization, and this initialization can be adapted with only a small number of gradient updates. As a result, the pricing policy is able to achieve rapid recovery and adaptive adjustment across different time periods, product categories, and competitive states, which directly supports cross-scenario adaptation in non-stationary environments. At the same time, the objective function incorporates a CVaR tail-risk measure, price-adjustment friction, and stockout penalties to suppress excessive repricing and aggressive exploration, thereby reducing unstable policy behavior while retaining profitability-oriented optimization. In addition, business rules including price range, adjustment frequency, and adjustment magnitude are incorporated as auditable constraints for unified evaluation, so that performance, stability, and compliance can be assessed within the same framework. The simulation tests include promotional shocks, supply disturbances, and competitor strategy switching to mimic real markets, and these settings are used to evaluate the robustness of the proposed method under representative market disturbances. Results based on RetailRocket logs and high-fidelity multi-agent simulation evaluation show that, compared with strong baseline methods, ARL-GT improves cumulative revenue by approximately 9.3%, achieves a market share of 41.3%, and reduces the standard deviation of price fluctuations to 0.48. In scenarios involving the entry of learning-based competitors, the median recovery time is about 21 episodes, indicating faster recovery of pricing performance after competitive regime changes. A risk-sensitive configuration further improves CVaR at a 95% confidence level to 1.51 and reduces revenue variance, showing the effectiveness of the introduced risk-control design under the same framework. Ablation results show that inverse-reinforcement-learning-based demand calibration and meta-learning initialization make the most critical contributions to revenue improvement and recovery acceleration. Overall, the proposed method jointly considers profitability, stability, and auditability, and provides an interpretable game-theoretic decision-making scheme for the online deployment of dynamic pricing.

创建时间：

2026-04-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集