About me

I’m a final-year PhD candidate in the Unmanned Systems Lab advised by Prof. Yongcan Cao. Before joining UTSA, I completed my MS at the UM–SJTU Joint Institute in Shanghai under the supervision of Dr. Paul Weng, where I began working on fairness in reinforcement learning. My research broadly focuses on fairness, safety, and social welfare optimization in sequential decision-making systems: how can we design agents that not only maximize reward, but also distribute outcomes equitably across individuals, agents, and objectives? Most of my work lies at the intersection of multi-objective and multi-agent reinforcement learning, social welfare functions such as the Generalized Gini Function (GGF), and more recently, alignment methods for large language models, including RLHF, multi-objective DPO, and inference-time alignment. My dissertation, AI Alignment through Reinforcement Learning: Fairness, Safety, and Social Welfare Optimization, brings these themes together through both theoretical and applied perspectives. In parallel, I also develop deep reinforcement learning methods for autonomous systems, including drone guidance, multi-agent traffic control, and target enclosing under partial observability, with an emphasis on building methods that remain deployable in real-world dynamics and decision-making systems.

May 2026: Received the Gold Reviewer Award for ICML 2026.
May 2026: Paper accepted at RLC 2026.
May 2026: Serving as a reviewer for NeurIPS (NIPS) 2026.
April 2026: Paper accepted at ACL 2026.
January 2026: Paper accepted at ACC 2026.
January 2026: Serving as a reviewer for ICML 2026.
January 2026: Served as a Program Committee member for IJCAI 2026 and ECAI 2026.
September 2025: Paper accepted at Mechanistic Interpretability workshop @ NeurIPS 2025.
September 2025: Paper accepted at LAW@NeurIPS 2025.
August 2025: Paper accepted at AIAA SCITECH 2026.
July 2025: Serving as a Program Committee member for AAAI 2026.
May 2025: Fair-PbRL accepted at the Machine Learning Journal (MLJ).
May 2025: Three Papers accepted at the 2nd Reinforcement Learning Conference (RLC) 2025.
March 2025: Paper accepted at the International Workshop on Multi-Agent-Based Simulation (MABS) @ AAMAS 2025.
March 2025: Paper accepted at the Adaptive and Learning Agents (ALA) @ AAMAS 2025.

More News

Umer Siddique

About me

Recent Publications

Inference-Time Policy Alignment for Fair Reinforcement Learning

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning

Symbolic Policy Distillation for Interpretable Reinforcement Learning

ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling

Autonomous Target-Enclosing Guidance via Deep Reinforcement Learning

MODIFLY: A Scalable End-to-end Multi-Agent Simulation for Unmanned Aerial Vehicles

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning

From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation