From Fair Solutions to Compromise Solutions in Multi-Objective Deep Reinforcement Learning

Published in Neural Computing and Applications (NCAA), 2024

In this paper, we focus on multi-objective reinforcement learning (RL) where the expected vector returns are aggregated with a concave function. For this generic framework, which includes notably fair optimization in the multi-user setting and compromise optimization in the multi-criteria setting, we present several contributions. After a discussion of its theoretical properties (e.g., need to resort to stochastic policies), we prove a general performance bound that justifies learning a policy using discounted rewards, even if a policy optimal for the average reward is desired. We extend several deep RL algorithms for our problem, notably our adaptation of DQN can learn stochastic policies. In addition, to illustrate the generality of our framework, we consider in the multi-user setting a novel extension of fair optimization in deep RL where users have different entitlements. Our experimental results validate our propositions and also demonstrate its superiority to reward engineering in single-objective RL.

Recommended citation: Qian, Junqi, Umer Siddique, Guanbao Yu, and Paul Weng. "From Fair Solutions to Compromise Solutions in Multi-Objective Deep Reinforcement Learning." Neural Computing and Applications (NCAA). 2024.
Download Paper