Off-policy Evaluation in Doubly Inhomogeneous Environments
收藏DataCite Commons2024-10-11 更新2025-01-06 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Off-policy_Evaluation_in_Doubly_Inhomogeneous_Environments/26970329/1
下载链接
链接失效反馈官方服务:
资源简介:
This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions – temporal stationarity and individual homogeneity are both violated. To handle the “double inhomogeneities”, we propose a class of latent factor models for the reward and transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms state-of-the-art methods. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care. An R implementation of the proposed procedure is available at https://github.com/ZeyuBian/2FEOPE.
本研究旨在针对同时违反两项关键强化学习(reinforcement learning, RL)假设——时间平稳性与个体同质性——的场景,开展离线策略评估(off-policy evaluation, OPE)相关研究。为解决这一“双重非齐次性”问题,我们针对奖励函数与状态转移函数构建了一类隐因子模型,并基于该模型提出了一套兼具基于模型与无模型两种实现路径的通用OPE框架。据我们所知,本研究是首个针对存在双重非齐次性的离线强化学习场景,构建具备统计严谨性的OPE方法的学术工作。本研究有助于深化对不符合标准强化学习假设的场景下OPE的理论认知,并为该类场景提供了若干实用解决方案。我们论证了所提出的价值估计器的理论性质,并通过实验证明我们的方法优于现有最优方法。最后,我们通过取自重症监护医学信息数据库(Medical Information Mart for Intensive Care)的数据集对所提方法进行了实例演示。本研究所提流程的R语言实现可在https://github.com/ZeyuBian/2FEOPE处获取。
提供机构:
Taylor & Francis
创建时间:
2024-09-09



