About me
I’m a final-year PhD candidate in the Unmanned Systems Lab advised by Prof. Yongcan Cao. Before joining UTSA, I completed my MS at the UM–SJTU Joint Institute in Shanghai under the supervision of Dr. Paul Weng, where I began working on fairness in reinforcement learning. My research broadly focuses on fairness, safety, and social welfare optimization in sequential decision-making systems: how can we design agents that not only maximize reward, but also distribute outcomes equitably across individuals, agents, and objectives? Most of my work lies at the intersection of multi-objective and multi-agent reinforcement learning, social welfare functions such as the Generalized Gini Function (GGF), and more recently, alignment methods for large language models, including RLHF, multi-objective DPO, and inference-time alignment. My dissertation, AI Alignment through Reinforcement Learning: Fairness, Safety, and Social Welfare Optimization, brings these themes together through both theoretical and applied perspectives. In parallel, I also develop deep reinforcement learning methods for autonomous systems, including drone guidance, multi-agent traffic control, and target enclosing under partial observability, with an emphasis on building methods that remain deployable in real-world dynamics and decision-making systems.
- May 2026: Paper accepted at RLC 2026.
- May 2026: Serving as a reviewer for NeurIPS (NIPS) 2026.
- April 2026: Paper accepted at ACL 2026.
- January 2026: Paper accepted at ACC 2026.
- January 2026: Serving as a reviewer for ICML 2026.
- January 2026: Served as a Program Committee member for IJCAI 2026 and ECAI 2026.
- September 2025: Paper accepted at Mechanistic Interpretability workshop @ NeurIPS 2025.
- September 2025: Paper accepted at LAW@NeurIPS 2025.
- August 2025: Paper accepted at AIAA SCITECH 2026.
- July 2025: Serving as a Program Committee member for AAAI 2026.
- May 2025: Fair-PbRL accepted at the Machine Learning Journal (MLJ).
- May 2025: Three Papers accepted at the 2nd Reinforcement Learning Conference (RLC) 2025.
- March 2025: Paper accepted at the International Workshop on Multi-Agent-Based Simulation (MABS) @ AAMAS 2025.
- March 2025: Paper accepted at the Adaptive and Learning Agents (ALA) @ AAMAS 2025.
- Feb 2025: Served as a student volunteer for AAAI 2025.
Recent Publications
Inference-Time Policy Alignment for Fair Reinforcement Learning
Siddique, Umer, Peilang Li, Conor Wallace, and Yongcan Cao. "Inference-Time Policy Alignment for Fair Reinforcement Learning." Reinforcement Learning Conference (RLC) 2026.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis
Manzoor, Muhammad Arslan, Dilshod Azizov, Daniil Orel, Umer Siddique, Zain Muhammad Mujahid, Yufang Hou, and Preslav Nakov. "A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis." ACL 2026.
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Siddique, Umer, et al. "Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning." ACC 2026.
Symbolic Policy Distillation for Interpretable Reinforcement Learning
Li, Peilang, Umer Siddique and Yongcan Cao. "Symbolic Policy Distillation for Interpretable Reinforcement Learning." Mechanistic Interpretability workshop @ NeurIPS 2025.
ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling
Wallace, Conor, Umer Siddique, and Yongcan Cao. "ReCollab: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling." Language, Agent, and World Models for Reasoning and Planning Workshop at NeurIPS 2025.
Autonomous Target-Enclosing Guidance via Deep Reinforcement Learning
Siddique, Umer, Praveen Kumar Ranjan, Abhinav Sinha and Yongcan Cao. "Autonomous Target-Enclosing Guidance via Deep Reinforcement Learning." AIAA SCITECH 2026.
MODIFLY: A Scalable End-to-end Multi-Agent Simulation for Unmanned Aerial Vehicles
Cofield, Jeremy, Umer Siddique, and Yongcan Cao. "MODIFLY: A Scalable End-to-end Multi-Agent Simulation for Unmanned Aerial Vehicles." The 26th International Workshop on Multi-Agent-Based Simulation (MABS) (ALA) @ AAMAS 2025
Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
Siddique, Umer, Peilang Li, and Yongcan Cao. "Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning." Adaptive and Learning Aegnts (ALA) @ AAMAS 2025
Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning
Siddique, Umer, Peilang Li, and Yongcan Cao. "Towards Fair and Efficient Policy Learning in Cooperative Multi-Agent Reinforcement Learning." AAMAS 2025 (Extended Abstract).
From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation
Li, Peilang, Umer Siddique, and Yongcan Cao. "From Explainability to Interpretability: Interpretable Policies in Reinforcement Learning Via Model Explanation." Deployable AI Workshop @ AAAI. 2025.
