Publications

You can also find my articles on my Google Scholar profile.

Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning

Published in AAMAS 2025 (Extended Abstract), 2024

In this paper, we consider the problem of learning independent fair policies in cooperative multi-agent reinforcement learning (MARL). Our objective is to design multiple policies simultaneously that optimize a welfare function for fairness. To achieve this objective, we propose a novel Fairness-Aware multi-agent Proximal Policy Optimization (FAPPO) algorithm, which enables each agent to independently learn its policy while optimizing a welfare function. Unlike standard approaches that focus on maximizing a performance metric such as rewards, FAPPO focuses on fairness in an independent learning setting, where each agent estimates its local value function. When inter-agent communication is allowed, we further introduce an attention-based variant of FAPPO (AT-FAPPO), which incorporates a self-attention mechanism to facilitate communication and coordination among agents. This variant allows agents to share relevant information during training, leading to more fair outcomes. To show the effectiveness of the proposed methods, we conduct experiments in various environments and show that our approach outperforms existing methods both in terms of efficiency and equity.

Recommended citation: Siddique, Umer, Peilang Li, and Yongcan Cao. "Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning." AAMAS 2025 (Extended Abstract).
Download Paper

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation

Published in Deployable AI Workshop @ AAAI 2025, 2024

Deep reinforcement learning (RL) has shown remarkable success in complex domains, however, the inherent black box nature of deep neural network policies raises significant challenges in understanding and trusting the decision-making processes. While existing explainable RL methods provide local insights, they fail to deliver global understanding of the model, particularly in high-stakes applications. To overcome this limitation, we propose a novel model-agnostic approach that bridges the gap between explainability and interpretability by leveraging Shapley values to transform complex deep RL policies into transparent representations. The proposed approach offers two key contributions: a novel approach employing Shapley values to policy interpretation that goes beyond local explanations, and general framework applicable to both off-policy and on-policy algorithms. We evaluate our approach with three existing deep RL algorithms and validate its performance in two classic control environments. The results demonstrate that our approach not only preserves the original models’ effectiveness but also generates more stable interpretable policies.

Recommended citation: Li, Peilang, Umer Siddique, and Yongcan Cao. "From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation." Deployable AI Workshop @ AAAI. 2025.
Download Paper

Fairness in Traffic Control: Decentralized Multi-agent Reinforcement Learning with Generalized Gini Welfare Functions

Published in MALTA Workshop @ AAAI 2025, 2024

In this paper, we address the issue of learning fair policies in decentralized cooperative multi-agent reinforcement learning (MARL), with a focus on traffic light control systems. We show that standard MARL methods that optimize the expected rewards often lead to unfair treatment across different intersections. To overcome this limitation, we aim to design control policies that optimize a generalized Gini welfare function that explicitly encodes two aspects of fairness: efficiency and equity. Specifically, we propose three novel adaptations of MARL baselines that enable agents to learn decentralized fair policies, where each agent estimates its local value function while contributing to welfare optimization. We validate our approaches through extensive experiments across six traffic control environments with varying complexities and traffic layouts. The results demonstrate that our proposed methods consistently outperform existing MARL approaches both in terms of efficiency and equity.

Recommended citation: Siddique, Umer, Peilang Li, and Yongcan Cao. "Fairness in Traffic Control: Decentralized Multi-agent Reinforcement Learning with Generalized Gini Welfare Functions." MALTA Workshop @ AAAI. 2025.
Download Paper

From Fair Solutions to Compromise Solutions in Multi-Objective Deep Reinforcement Learning

Published in Neural Computing and Applications (NCAA), 2024

In this paper, we focus on multi-objective reinforcement learning (RL) where the expected vector returns are aggregated with a concave function. For this generic framework, which includes notably fair optimization in the multi-user setting and compromise optimization in the multi-criteria setting, we present several contributions. After a discussion of its theoretical properties (e.g., need to resort to stochastic policies), we prove a general performance bound that justifies learning a policy using discounted rewards, even if a policy optimal for the average reward is desired. We extend several deep RL algorithms for our problem, notably our adaptation of DQN can learn stochastic policies. In addition, to illustrate the generality of our framework, we consider in the multi-user setting a novel extension of fair optimization in deep RL where users have different entitlements. Our experimental results validate our propositions and also demonstrate its superiority to reward engineering in single-objective RL.

Recommended citation: Qian, Junqi, Umer Siddique, Guanbao Yu, and Paul Weng. "From Fair Solutions to Compromise Solutions in Multi-Objective Deep Reinforcement Learning." Neural Computing and Applications (NCAA). 2024.
Download Paper

Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems

Published in Arxiv, 2024

In this paper, we propose an adaptive event-triggered reinforcement learning control for continuous-time nonlinear systems, subject to bounded uncertainties, characterized by complex interactions. Specifically, the proposed method is capable of jointly learning both the control policy and the communication policy, thereby reducing the number of parameters and computational overhead when learning them separately or only one of them. By augmenting the state space with accrued rewards that represent the performance over the entire trajectory, we show that accurate and efficient determination of triggering conditions is possible without the need for explicit learning triggering conditions, thereby leading to an adaptive non-stationary policy. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed approach. Download paper here

Recommended citation: Siddique, Umer, Abhinav Sinha, and Yongcan Cao. "Adaptive Event-triggered Reinforcement Learning Control for Complex Nonlinear Systems." arXiv preprint arXiv:2409.19769 (2024).
Download Paper

Opponent Transformer: Modeling Opponent Policies as a Sequence Problem

Published in Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop @ RLC, 2024

The ability of an agent to understand the intentions of others in a multi-agent system, also called opponent modeling, is critical for the design of effective local control policies. One main challenge is the unavailability of other agents’ episodic trajectories at execution. To address the challenge, we propose a new approach that explicitly models the episodic trajectories of others. In particular, the proposed approach is to cast the opponent modeling problem as a sequence modeling problem via conditioning a transformer model on the sequence of the agent’s local trajectory and predicting each opponent agent’s trajectory. To evaluate the effectiveness of the proposed approach, we conduct experiments using a set of multi-agent environments that capture both cooperative and competitive payoff structures. The results show that the proposed method can provide better opponent modeling capabilities while achieving competitive or superior episodic returns. Download paper here

Recommended citation: Wallace, Conor, Umer Siddique, and Yongcan Cao. "Opponent Transformer: Modeling Opponent Policies as a Sequence Problem." Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop @ RLC. 2024.
Download Paper

Towards Fair and Equitable Policy Learning in Cooperative Multi-Agent Reinforcement Learning

Published in Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop @ RLC, 2024

In this paper, we consider the problem of learning independent fair policies in cooperative multi-agent reinforcement learning (MARL). The objective is to design multiple policies simultaneously that can optimize a welfare function for fairness. To achieve this objective, we propose a novel Fairness-Aware multi-agent Proximal Policy Optimization (FAPPO) algorithm, which learns individual policies for all agents separately and optimizes a welfare function to ensure fairness among them, in contrast to optimizing the discounted rewards. The proposed approach is shown to learn fair policies in the independent learning setting, where each agent estimates its local value function. When inter-agent communication is allowed, we further introduce an attention-based variant of FAPPO (AT-FAPPO) by incorporating a self-attention mechanism for inter-agent communication. This variant enables agents to communicate and coordinate their actions, potentially leading to more fair solutions by leveraging the ability to share relevant information during training. To show the effectiveness of the proposed methods, we conduct experiments in two environments and show that our approach outperforms previous methods both in terms of efficiency and equity. Download paper here

Recommended citation: Siddique, Umer, Peilang Li, and Yongcan Cao. "Towards Fair and Equitable Policy Learning in Cooperative Multi-Agent Reinforcement Learning." Coordination and Cooperation for Multi-Agent Reinforcement Learning Methods Workshop @ RLC. 2024.
Download Paper

Offline Reinforcement Learning with Failure Under Sparse Reward Environments

Published in 3rd International Conference on Computing and Machine Intelligence (ICMI), 2024

This paper presents a new reinforcement learning approach that leverages failed experiences in sparse reward environments. Unlike traditional reinforcement learning methods that rely on successful experiences or expert demonstrations, the proposed approach utilizes failed experiences to guide the policy update during the learning process. The primary objective behind this work is to develop a method that can efficiently utilize failed experiences to guide the search direction as directional cues from successful experiences may be limited in sparse environments. To achieve this objective, we introduce a new objective function that aims to maximize the dissimilarity between the RL agent’s actions and actions from failed experiences. This discrepancy serves as a valuable indicator to guide the agent’s exploration. In other words, our method focuses on achieving the desired objectives by leveraging failed experiences to provide a significant opportunity for the agent to refine its policy. We further employ hindsight experience replay (HER) to enhance the directional search by creating and achieving potential subgoals that align with the primary objectives. To assess the effectiveness of our method, we conduct experiments on three sparse reward environments. Our findings demonstrate that the proposed approach significantly enhances the agent’s learning efficiency and improves robustness to variations in demonstration quality compared to conventional reinforcement learning techniques.

Recommended citation: Wu, Mingkang, Umer Siddique, Abhinav Sinha, and Yongcan Cao. "Offline Reinforcement Learning with Failure Under Sparse Reward Environments." 3rd International Conference on Computing and Machine Intelligence (ICMI). 2024.
Download Paper

On Deep Reinforcement Learning for Target Capture Autonomous Guidance

Published in AIAA Guidance, Navigation, and Control Conference, 2024

This paper explores the prospects of motion planning of autonomous vehicles using deep reinforcement learning (DRL). We are particularly interested in a goal-directed setting where the need is to design an optimal guidance strategy for a pursuing autonomous vehicle (the pursuer, which is also the DRL agent) to capture an adversary (the target). To this end, we first formulate the target capture guidance problem as a Markov Decision Process (MDP) wherein the kinematics of relative motion between the vehicles constitute the MDP, and the pursuer’s lateral acceleration (chosen as its steering control to account for turn constraints) is the action of DRL agent. We show that a multifaceted reward function motivated by the collision conditions is sufficient and effective in designing the reinforcement learning action that enables the pursuer to capture the target regardless of the latter’s motion. We then empirically evaluate the performance of the trained agent in various target capture scenarios. Download paper here

Recommended citation: Siddique, Umer, Abhinav Sinha, and Yongcan Cao. "On Deep Reinforcement Learning for Target Capture Autonomous Guidance." AIAA SCITECH 2024 Forum. 2024.
Download Paper

Fair Deep Reinforcement Learning with Generalized Gini Welfare Functions

Published in Adaptive Learning Agents Workshop @ AAMAS, 2023

Learning fair policies in reinforcement learning (RL) is important when the RL agent’s actions may impact many users. In this paper, we investigate a generalization of this problem where equity is still desired, but some users may be entitled to preferential treatment. We formalize this more sophisticated fair optimization problem in deep RL, provide some theoretical discussion of its difficulties, and explain how existing deep RL algorithms can be adapted to tackle it. Our algorithmic innovations notably include a state-augmented DQN-based method for learning stochastic policies, which also applies to the usual fair optimization setting without any preferential treatment. We empirically validate our propositions and analyze the experimental results on several application domains. This paper was selected as the best paper at the Adaptive Learning Agents Workshop @ AAMAS 2023. Download paper here

Recommended citation: Yu, Guanbao, Umer Siddique, and Paul Weng. "Fair Deep Reinforcement Learning with Generalized Gini Welfare Functions." International Conference on Autonomous Agents and Multiagent Systems. Cham: Springer Nature Switzerland, 2023.
Download Paper

Fair deep reinforcement learning with preferential treatment

Published in European Conference on Artificial Intelligence (ECAI), 2023

Learning fair policies in reinforcement learning (RL) is important when the RL agent may impact many users. We investigate a variant of this problem where equity is still desired, but some users may be entitled to preferential treatment. In this paper, we formalize this more sophisticated fair optimization problem in deep RL using generalized fair social welfare functions (SWF), provide a theoretical discussion to justify our approach, explain how deep RL algorithms can be adapted to tackle it, and empirically validate our propositions on several domains. Our contributions are both theoretical and algorithmic, notably: (1) We obtain a general bound on the suboptimality gap in terms of SWF-optimality using average reward of a policy SWF-optimal for the discounted reward, which notably justifies using standard deep RL algorithms, even for the average reward; (2) Our algorithmic innovations include a state-augmented DQN-based method for learning either deterministic or stochastic policies, which also applies to the usual fair optimization setting without any preferential treatment Download paper here

Recommended citation: Yu, Guanbao, Umer Siddique, and Paul Weng. "Fair Deep Reinforcement Learning with Preferential Treatment." ECAI. 2023.
Download Paper

Fairness in Preference-based Reinforcement Learning

Published in MFPL @ International Conference on Machine Learning, 2023

In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies. Download paper here

Recommended citation: Siddique, Umer, Abhinav Sinha, and Yongcan Cao. "Fairness in Preference-based Reinforcement Learning." ICML 2023 Workshop The Many Facets of Preference-Based Learning. 2023.
Download Paper

Learning fair policies in decentralized cooperative multi-agent reinforcement learning

Published in International Conference on Machine Learning (ICML), 2021

In this paper, we consider the problem of learning fair policies in (deep) cooperative multi-agent reinforcement learning (MARL). We formalize it in a principled way as the problem of optimizing a welfare function that explicitly encodes two important aspects of fairness: efficiency and equity. We provide a theoretical analysis of the convergence of policy gradient for this problem. As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account these two aspects of fairness. In experiments, we demonstrate the importance of the two sub-networks for fair optimization. Our overall approach is general as it can accommodate any (sub)differentiable welfare function. Therefore, it is compatible with various notions of fairness that have been proposed in the literature (e.g., lexicographic maximin, generalized Gini social welfare function, proportional fairness). Our method is generic and can be implemented in various MARL settings: centralized training and decentralized execution, or fully decentralized. We evaluate our method on a set of fair cooperative MARL benchmarks, where we show that it outperforms the state-of-the-art methods in terms of fairness and performance. Download paper here

Recommended citation: Zimmer, Matthieu, et al. "Learning fair policies in decentralized cooperative multi-agent reinforcement learning." International Conference on Machine Learning. PMLR, 2021.
Download Paper

Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards

Published in International Conference on Machine Learning (ICML), 2020

As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), in this paper, we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness that we formally define, is optimized. For this problem, we provide a theoretical discussion where we examine the case of discounted rewards and that of average rewards. During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward. Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward. We propose three novel deep RL adaptations to learn fair policies. We evaluate these methods on a set of fair multi-objective deep RL benchmarks, where we show that they outperform the state-of-the-art methods in terms of fairness and performance. Download paper here

Recommended citation: Siddique, Umer, Paul Weng, and Matthieu Zimmer. "Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards." International Conference on Machine Learning. PMLR, 2020.
Download Paper | Download Slides