Abstract:The rapid advancement of AI has changed the character of HPC usage such as dimensioning, provisioning, and execution. Not only has energy demand been amplified, but existing rudimentary continual learning capabilities limit ability of AI to effectively manage HPCs. This paper reviews emerging directions beyond monolithic transformers, emphasizing agentic AI and brain inspired architectures as complementary paths toward sustainable, adaptive systems. We propose LIFE, a reasoning and Learning framework that is Incremental, Flexible, and Energy efficient that is implemented as an agent centric system rather than a single monolithic model. LIFE uniquely combines four components to realize self evolving network management and operations in HPCs. The components are an orchestrator, Agentic Context Engineering, a novel memory system, and information lattice learning. LIFE can also generalize to enable a variety of orthogonal use cases. We ground LIFE in a specific closed loop HPC operations example for detecting and mitigating latency spikes experienced by critical micro services running on a Kubernetes like cluster.




Abstract:Re-configurable robots potentially have more utility and flexibility for many real-world tasks. Designing a learning agent to operate such robots requires adapting to different configurations. While deep reinforcement learning has had immense success in robotic manipulation, domain adaptation is a significant problem that limits its applicability to real-world robotics. We focus on robotic arms with multiple rigid links connected by joints. Recent attempts have performed domain adaptation and Sim2Real transfer to provide robustness to robotic arm dynamics and sensor/camera variations. However, there have been no previous attempts to adapt to robotic arms with a varying number of links. We propose an RL agent with sequence neural networks embedded in the deep neural network to adapt to robotic arms that have a varying number of links. Further, with the additional tool of domain randomization, this agent adapts to different configurations with varying number/length of links and dynamics noise. We perform simulations on a 2D N-link arm to show the ability of our network to transfer and generalize efficiently.