Alert button
Picture for Jake Grigsby

Jake Grigsby

Alert button

PGrad: Learning Principal Gradients For Domain Generalization

May 02, 2023
Zhe Wang, Jake Grigsby, Yanjun Qi

Figure 1 for PGrad: Learning Principal Gradients For Domain Generalization
Figure 2 for PGrad: Learning Principal Gradients For Domain Generalization
Figure 3 for PGrad: Learning Principal Gradients For Domain Generalization
Figure 4 for PGrad: Learning Principal Gradients For Domain Generalization

Machine learning models fail to perform when facing out-of-distribution (OOD) domains, a challenging task known as domain generalization (DG). In this work, we develop a novel DG training strategy, we call PGrad, to learn a robust gradient direction, improving models' generalization ability on unseen domains. The proposed gradient aggregates the principal directions of a sampled roll-out optimization trajectory that measures the training dynamics across all training domains. PGrad's gradient design forces the DG training to ignore domain-dependent noise signals and updates all training domains with a robust direction covering main components of parameter dynamics. We further improve PGrad via bijection-based computational refinement and directional plus length-based calibrations. Our theoretical proof connects PGrad to the spectral analysis of Hessian in training neural networks. Experiments on DomainBed and WILDS benchmarks demonstrate that our approach effectively enables robust DG optimization and leads to smoothly decreased loss curves. Empirically, PGrad achieves competitive results across seven datasets, demonstrating its efficacy across both synthetic and real-world distributional shifts. Code is available at https://github.com/QData/PGrad.

Viaarxiv icon

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Dec 02, 2022
Vanamala Venkataswamy, Jake Grigsby, Andrew Grimshaw, Yanjun Qi

Figure 1 for Launchpad: Learning to Schedule Using Offline and Online RL Methods
Figure 2 for Launchpad: Learning to Schedule Using Offline and Online RL Methods
Figure 3 for Launchpad: Learning to Schedule Using Offline and Online RL Methods
Figure 4 for Launchpad: Learning to Schedule Using Offline and Online RL Methods

Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods generate good results, we show that the performance is highly dependent on the quality of the historical datasets. Finally, we demonstrate that by effectively incorporating prior expert demonstrations to pre-train the agent, we short-circuit the random exploration phase to learn a reasonable policy with online training. We utilize Offline RL as a launchpad to learn effective scheduling policies from prior experience collected using Oracle or heuristic policies. Such a framework is effective for pre-training from historical datasets and well suited to continuous improvement with online data collection.

Viaarxiv icon

RARE: Renewable Energy Aware Resource Management in Datacenters

Nov 10, 2022
Vanamala Venkataswamy, Jake Grigsby, Andrew Grimshaw, Yanjun Qi

Figure 1 for RARE: Renewable Energy Aware Resource Management in Datacenters
Figure 2 for RARE: Renewable Energy Aware Resource Management in Datacenters
Figure 3 for RARE: Renewable Energy Aware Resource Management in Datacenters
Figure 4 for RARE: Renewable Energy Aware Resource Management in Datacenters

The exponential growth in demand for digital services drives massive datacenter energy consumption and negative environmental impacts. Promoting sustainable solutions to pressing energy and digital infrastructure challenges is crucial. Several hyperscale cloud providers have announced plans to power their datacenters using renewable energy. However, integrating renewables to power the datacenters is challenging because the power generation is intermittent, necessitating approaches to tackle power supply variability. Hand engineering domain-specific heuristics-based schedulers to meet specific objective functions in such complex dynamic green datacenter environments is time-consuming, expensive, and requires extensive tuning by domain experts. The green datacenters need smart systems and system software to employ multiple renewable energy sources (wind and solar) by intelligently adapting computing to renewable energy generation. We present RARE (Renewable energy Aware REsource management), a Deep Reinforcement Learning (DRL) job scheduler that automatically learns effective job scheduling policies while continually adapting to datacenters' complex dynamic environment. The resulting DRL scheduler performs better than heuristic scheduling policies with different workloads and adapts to the intermittent power supply from renewables. We demonstrate DRL scheduler system design parameters that, when tuned correctly, produce better performance. Finally, we demonstrate that the DRL scheduler can learn from and improve upon existing heuristic policies using Offline Learning.

* Accepted at JSSPP-2022 
Viaarxiv icon

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Oct 10, 2021
Jake Grigsby, Yanjun Qi

Figure 1 for A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets
Figure 2 for A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets
Figure 3 for A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets
Figure 4 for A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.

* Honors Undergraduate Thesis, UVA 2021. 15 pages 
Viaarxiv icon

ST-MAML: A Stochastic-Task based Method for Task-Heterogeneous Meta-Learning

Sep 27, 2021
Zhe Wang, Jake Grigsby, Arshdeep Sekhon, Yanjun Qi

Figure 1 for ST-MAML: A Stochastic-Task based Method for Task-Heterogeneous Meta-Learning
Figure 2 for ST-MAML: A Stochastic-Task based Method for Task-Heterogeneous Meta-Learning
Figure 3 for ST-MAML: A Stochastic-Task based Method for Task-Heterogeneous Meta-Learning
Figure 4 for ST-MAML: A Stochastic-Task based Method for Task-Heterogeneous Meta-Learning

Optimization-based meta-learning typically assumes tasks are sampled from a single distribution - an assumption oversimplifies and limits the diversity of tasks that meta-learning can model. Handling tasks from multiple different distributions is challenging for meta-learning due to a so-called task ambiguity issue. This paper proposes a novel method, ST-MAML, that empowers model-agnostic meta-learning (MAML) to learn from multiple task distributions. ST-MAML encodes tasks using a stochastic neural network module, that summarizes every task with a stochastic representation. The proposed Stochastic Task (ST) strategy allows a meta-model to get tailored for the current task and enables us to learn a distribution of solutions for an ambiguous task. ST-MAML also propagates the task representation to revise the encoding of input variables. Empirically, we demonstrate that ST-MAML matches or outperforms the state-of-the-art on two few-shot image classification tasks, one curve regression benchmark, one image completion problem, and a real-world temperature prediction application. To the best of authors' knowledge, this is the first time optimization-based meta-learning method being applied on a large-scale real-world task.

Viaarxiv icon

Long-Range Transformers for Dynamic Spatiotemporal Forecasting

Sep 24, 2021
Jake Grigsby, Zhe Wang, Yanjun Qi

Figure 1 for Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Figure 2 for Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Figure 3 for Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Figure 4 for Long-Range Transformers for Dynamic Spatiotemporal Forecasting

Multivariate Time Series Forecasting (TSF) focuses on the prediction of future values based on historical context. In these problems, dependent variables provide additional information or early warning signs of changes in future behavior. State-of-the-art forecasting models rely on neural attention between timesteps. This allows for temporal learning but fails to consider distinct spatial relationships between variables. This paper addresses the problem by translating multivariate TSF into a novel spatiotemporal sequence formulation where each input token represents the value of a single variable at a given timestep. Long-Range Transformers can then learn interactions between space, time, and value information jointly along this extended sequence. Our method, which we call Spacetimeformer, scales to high dimensional forecasting problems dominated by Graph Neural Networks that rely on predefined variable graphs. We achieve competitive results on benchmarks from traffic forecasting to electricity demand and weather prediction while learning spatial and temporal relationships purely from data.

Viaarxiv icon

Towards Automatic Actor-Critic Solutions to Continuous Control

Jun 16, 2021
Jake Grigsby, Jin Yong Yoo, Yanjun Qi

Figure 1 for Towards Automatic Actor-Critic Solutions to Continuous Control
Figure 2 for Towards Automatic Actor-Critic Solutions to Continuous Control
Figure 3 for Towards Automatic Actor-Critic Solutions to Continuous Control
Figure 4 for Towards Automatic Actor-Critic Solutions to Continuous Control

Model-free off-policy actor-critic methods are an efficient solution to complex continuous control tasks. However, these algorithms rely on a number of design tricks and many hyperparameters, making their applications to new domains difficult and computationally expensive. This paper creates an evolutionary approach that automatically tunes these design decisions and eliminates the RL-specific hyperparameters from the Soft Actor-Critic algorithm. Our design is sample efficient and provides practical advantages over baseline approaches, including improved exploration, generalization over multiple control frequencies, and a robust ensemble of high-performance policies. Empirically, we show that our agent outperforms well-tuned hyperparameter settings in popular benchmarks from the DeepMind Control Suite. We then apply it to new control tasks to find high-performance solutions with minimal compute and research effort.

* 10 pages, 4 figures 
Viaarxiv icon

Measuring Visual Generalization in Continuous Control from Pixels

Oct 13, 2020
Jake Grigsby, Yanjun Qi

Figure 1 for Measuring Visual Generalization in Continuous Control from Pixels
Figure 2 for Measuring Visual Generalization in Continuous Control from Pixels
Figure 3 for Measuring Visual Generalization in Continuous Control from Pixels
Figure 4 for Measuring Visual Generalization in Continuous Control from Pixels

Self-supervised learning and data augmentation have significantly reduced the performance gap between state and image-based reinforcement learning agents in continuous control tasks. However, it is still unclear whether current techniques can face a variety of visual conditions required by real-world environments. We propose a challenging benchmark that tests agents' visual generalization by adding graphical variety to existing continuous control domains. Our empirical analysis shows that current methods struggle to generalize across a diverse set of visual changes, and we examine the specific factors of variation that make these tasks difficult. We find that data augmentation techniques outperform self-supervised learning approaches and that more significant image transformations provide better visual generalization \footnote{The benchmark and our augmented actor-critic implementation are open-sourced @ https://github.com/jakegrigsby/dmc_remastered)

* A total of 17 pages, 8 pages as the main text 
Viaarxiv icon