Generalizing human-natural systems modeling and decision-making: A multi-agent deep reinforcement learning framework and its application to the tragedy of the commons

中国科学数据2025-12-29 更新2026-04-25 收录

下载链接：

https://www.sciengine.com/AA/doi/10.1007/s11430-025-1738-8

下载链接

链接失效反馈

官方服务：

资源简介：

Human-natural systems (HNS) are complex, adaptive systems where human activities and natural processes are deeply intertwined. Stochastic processes, nonlinear couplings, feedback loops, and emergent phenomena collectively shape the interactions between human behaviors and natural dynamics. While environmental models in Earth system science are relatively well-established (equation- or process-based), modeling human systems remains insufficient. Moreover, there lacks a unified framework for effectively characterizing these complex interactions, posing significant challenges for HNS modeling and decision-making. To fill these gaps, this study proposes an integrated multi-agent deep reinforcement learning (MADRL) framework that combines Markov decision process (MDP), agent-based modeling (ABM), and deep reinforcement learning (DRL) to address modeling and decision-making challenges in HNS. The framework is structured as an MDP, defined by four core components: states of the environment (natural system), actions of agents (human system), transitions of the states (evolution of the HNS), and rewards. We introduce ABM to simulate human behaviors, decision-makings, and interactions among multi-hierarchical stakeholders, including individuals, groups, communities, governments, and non-governmental organizations. Additionally, DRL is employed to tackle the high-dimensional solving challenges of the MDP. Finally, a classic case study based on the “Tragedy of the Commons” is designed, featuring multiple fishermen operating under specific decision rules around a shared fishpond resource. The results demonstrate that under purely economic-driven incentives, fishermen tend to adopt high-intensity fishing strategies. This leads to fish populations rapidly declining from an initial 1,600 units to near-zero within 20 time steps, reproducing the classic “Tragedy of the Commons” phenomenon. In contrast, introducing sustainability penalty mechanisms or cooperative mechanisms effectively guides fishermen to adjust their fishing strategies. These mechanisms promote more stable and moderate fishing behaviors, maintaining fish populations at approximately 500 units and 1,500 units respectively throughout the simulation period. Furthermore, by incorporating behavioral parameters (greediness factors), the model effectively captures heterogeneity in fishing propensities. High-greediness fishermen exhibit aggressive behaviors while low-greediness fishermen adopt more conservative strategies, revealing the impact of individual behavioral differences on system dynamics. These findings validate our proposed MADRL framework’s ability to capture the dynamic feedback loops between heterogeneous agents and their environment, as well as emergent non-linear phenomena. By providing an integrated framework for analyzing and understanding these core mechanisms among multiple processes, agents, and activities of HNS, this study lays the foundation for future large-scale numerical experiments that address governance and decision-making challenges across multiple scales.

人-自然系统（Human-natural systems, HNS）是一类复杂自适应系统，人类活动与自然过程在此类系统中深度交织。随机过程、非线性耦合、反馈环路与涌现现象共同塑造了人类行为与自然动态之间的交互关系。尽管地球系统科学中的环境模型（基于方程或过程）已相对成熟，但人类系统建模仍存在显著不足。此外，目前缺乏能够有效表征此类复杂交互的统一框架，这为人-自然系统建模与决策制定带来了严峻挑战。为填补这些空白，本研究提出了一种集成多智能体深度强化学习（multi-agent deep reinforcement learning, MADRL）框架，该框架结合了马尔可夫决策过程（Markov decision process, MDP）、基于智能体的建模（agent-based modeling, ABM）与深度强化学习（deep reinforcement learning, DRL），以应对人-自然系统中的建模与决策挑战。该框架以马尔可夫决策过程为基础，由四大核心组件构成：环境（自然系统）状态、智能体（人类系统）动作、状态转移（人-自然系统演化）以及奖励函数。我们引入基于智能体的建模方法，以模拟多层级利益相关者——包括个体、群体、社区、政府与非政府组织——的人类行为、决策制定与交互关系。此外，我们采用深度强化学习来解决马尔可夫决策过程的高维求解难题。最后，本研究设计了一项基于“公地悲剧（Tragedy of the Commons）”的经典案例研究，场景为多名渔民围绕共享鱼塘资源，依据特定决策规则开展捕捞活动。研究结果表明，在纯经济激励驱动下，渔民往往会采取高强度捕捞策略，这导致鱼类种群数量在20个时间步内从初始的1600单位快速下降至近乎为零，重现了经典的“公地悲剧”现象。与之相对，引入可持续性惩罚机制或合作机制可有效引导渔民调整捕捞策略：两类机制分别促使种群数量在整个模拟周期内维持在约500单位与1500单位，实现更稳定且适度的捕捞行为。此外，通过纳入行为参数（贪婪度因子），该模型可有效捕捉捕捞倾向的异质性：高贪婪度渔民表现出激进的捕捞行为，而低贪婪度渔民则采取更为保守的策略，揭示了个体行为差异对系统动态的影响。上述研究结果验证了所提出的多智能体深度强化学习框架能够捕捉异质智能体与其所处环境间的动态反馈环路，以及涌现的非线性现象。本研究为人-自然系统的多重过程、智能体与活动间核心机制的分析与理解提供了集成框架，为未来开展多尺度治理与决策挑战相关的大规模数值实验奠定了基础。

创建时间：

2025-10-29