Abstract:Sequential Social Dilemmas (SSDs) provide a key framework for studying how cooperation emerges when individual incentives conflict with collective welfare. In Multi-Agent Reinforcement Learning, these problems are often addressed by incorporating intrinsic drives that encourage prosocial or fair behavior. However, most existing methods assume that agents face identical incentives in the dilemma and require continuous access to global information about other agents to assess fairness. In this work, we introduce asymmetric variants of well-known SSD environments and examine how natural differences between agents influence cooperation dynamics. Our findings reveal that existing fairness-based methods struggle to adapt under asymmetric conditions by enforcing raw equality that wrongfully incentivize defection. To address this, we propose three modifications: (i) redefining fairness by accounting for agents' reward ranges, (ii) introducing an agent-based weighting mechanism to better handle inherent asymmetries, and (iii) localizing social feedback to make the methods effective under partial observability without requiring global information sharing. Experimental results show that in asymmetric scenarios, our method fosters faster emergence of cooperative policies compared to existing approaches, without sacrificing scalability or practicality.




Abstract:Balancing individual specialisation and shared behaviours is a critical challenge in multi-agent reinforcement learning (MARL). Existing methods typically focus on encouraging diversity or leveraging shared representations. Full parameter sharing (FuPS) improves sample efficiency but struggles to learn diverse behaviours when required, while no parameter sharing (NoPS) enables diversity but is computationally expensive and sample inefficient. To address these challenges, we introduce HyperMARL, a novel approach using hypernetworks to balance efficiency and specialisation. HyperMARL generates agent-specific actor and critic parameters, enabling agents to adaptively exhibit diverse or homogeneous behaviours as needed, without modifying the learning objective or requiring prior knowledge of the optimal diversity. Furthermore, HyperMARL decouples agent-specific and state-based gradients, which empirically correlates with reduced policy gradient variance, potentially offering insights into its ability to capture diverse behaviours. Across MARL benchmarks requiring homogeneous, heterogeneous, or mixed behaviours, HyperMARL consistently matches or outperforms FuPS, NoPS, and diversity-focused methods, achieving NoPS-level diversity with a shared architecture. These results highlight the potential of hypernetworks as a versatile approach to the trade-off between specialisation and shared behaviours in MARL.