Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefanie Jegelka

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

Limits, approximation and size transferability for GNNs on sparse graphs via graphops

Jun 07, 2023
Thien Le, Stefanie Jegelka

Can graph neural networks generalize to graphs that are different from the graphs they were trained on, e.g., in size? In this work, we study this question from a theoretical perspective. While recent work established such transferability and approximation results via graph limits, e.g., via graphons, these only apply non-trivially to dense graphs. To include frequently encountered sparse graphs such as bounded-degree or power law graphs, we take a perspective of taking limits of operators derived from graphs, such as the aggregation operation that makes up GNNs. This leads to the recently introduced limit notion of graphops (Backhausz and Szegedy, 2022). We demonstrate how the operator perspective allows us to develop quantitative bounds on the distance between a finite GNN and its limit on an infinite graph, as well as the distance between the GNN on graphs of different sizes that share structural properties, under a regularity assumption verified for various graph sequences. Our results hold for dense and sparse graphs, and various notions of graph limits.

* NeurIPS 2023 submission, 34 pages

Via

Access Paper or Ask Questions

The Exact Sample Complexity Gain from Invariances for Kernel Regression on Manifolds

Mar 24, 2023
Behrooz Tahmasebi, Stefanie Jegelka

In practice, encoding invariances into models helps sample complexity. In this work, we tighten and generalize theoretical results on how invariances improve sample complexity. In particular, we provide minimax optimal rates for kernel ridge regression on any manifold, with a target function that is invariant to an arbitrary group action on the manifold. Our results hold for (almost) any group action, even groups of positive dimension. For a finite group, the gain increases the "effective" number of samples by the group size. For groups of positive dimension, the gain is observed by a reduction in the manifold's dimension, in addition to a factor proportional to the volume of the quotient space. Our proof takes the viewpoint of differential geometry, in contrast to the more common strategy of using invariant polynomials. Hence, this new geometric viewpoint on learning with invariances may be of independent interest.

Via

Access Paper or Ask Questions

Tetris-inspired detector with neural network for radiation mapping

Feb 07, 2023
Ryotaro Okabe, Shangjie Xue, Jiankai Yu, Tongtong Liu, Benoit Forget, Stefanie Jegelka, Gordon Kohse, Lin-wen Hu, Mingda Li

Figure 1 for Tetris-inspired detector with neural network for radiation mapping

Figure 2 for Tetris-inspired detector with neural network for radiation mapping

Figure 3 for Tetris-inspired detector with neural network for radiation mapping

Figure 4 for Tetris-inspired detector with neural network for radiation mapping

In recent years, radiation mapping has attracted widespread research attention and increased public concerns on environmental monitoring. In terms of both materials and their configurations, radiation detectors have been developed to locate the directions and positions of the radiation sources. In this process, algorithm is essential in converting detector signals to radiation source information. However, due to the complex mechanisms of radiation-matter interaction and the current limitation of data collection, high-performance, low-cost radiation mapping is still challenging. Here we present a computational framework using Tetris-inspired detector pixels and machine learning for radiation mapping. Using inter-pixel padding to increase the contrast between pixels and neural network to analyze the detector readings, a detector with as few as four pixels can achieve high-resolution directional mapping. By further imposing Maximum a Posteriori (MAP) with a moving detector, further radiation position localization is achieved. Non-square, Tetris-shaped detector can further improve performance beyond the conventional grid-shaped detector. Our framework offers a new avenue for high quality radiation mapping with least number of detector pixels possible, and is anticipated to be capable to deploy for real-world radiation detection with moderate validation.

* 29 pages, 20 figures. Ryotaro Okabe and Shangjie Xue contributed equally to this work

Via

Access Paper or Ask Questions

Debiasing Vision-Language Models via Biased Prompts

Jan 31, 2023
Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, Stefanie Jegelka

Figure 1 for Debiasing Vision-Language Models via Biased Prompts

Figure 2 for Debiasing Vision-Language Models via Biased Prompts

Figure 3 for Debiasing Vision-Language Models via Biased Prompts

Figure 4 for Debiasing Vision-Language Models via Biased Prompts

Machine learning models have been shown to inherit biases from their training datasets, which can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.

Via

Access Paper or Ask Questions

Efficiently predicting high resolution mass spectra with graph neural networks

Jan 26, 2023
Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, Thomas Butler

Figure 1 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 2 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 3 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 4 for Efficiently predicting high resolution mass spectra with graph neural networks

Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas. We discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and orders-of-magnitude faster runtime than state-of-the-art methods.

Via

Access Paper or Ask Questions

Optimal algorithms for group distributionally robust optimization and beyond

Dec 28, 2022
Tasuku Soma, Khashayar Gatmiry, Stefanie Jegelka

Figure 1 for Optimal algorithms for group distributionally robust optimization and beyond

Figure 2 for Optimal algorithms for group distributionally robust optimization and beyond

Distributionally robust optimization (DRO) can improve the robustness and fairness of learning methods. In this paper, we devise stochastic algorithms for a class of DRO problems including group DRO, subpopulation fairness, and empirical conditional value at risk (CVaR) optimization. Our new algorithms achieve faster convergence rates than existing algorithms for multiple DRO settings. We also provide a new information-theoretic lower bound that implies our bounds are tight for group DRO. Empirically, too, our algorithms outperform known methods

Via

Access Paper or Ask Questions

InfoOT: Information Maximizing Optimal Transport

Oct 06, 2022
Ching-Yao Chuang, Stefanie Jegelka, David Alvarez-Melis

Figure 1 for InfoOT: Information Maximizing Optimal Transport

Figure 2 for InfoOT: Information Maximizing Optimal Transport

Figure 3 for InfoOT: Information Maximizing Optimal Transport

Figure 4 for InfoOT: Information Maximizing Optimal Transport

Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.

Via

Access Paper or Ask Questions

Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Oct 04, 2022
Ching-Yao Chuang, Stefanie Jegelka

Figure 1 for Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Figure 2 for Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Figure 3 for Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Figure 4 for Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks

Understanding generalization and robustness of machine learning models fundamentally relies on assuming an appropriate metric on the data space. Identifying such a metric is particularly challenging for non-Euclidean data such as graphs. Here, we propose a pseudometric for attributed graphs, the Tree Mover's Distance (TMD), and study its relation to generalization. Via a hierarchical optimal transport problem, TMD reflects the local distribution of node attributes as well as the distribution of local computation trees, which are known to be decisive for the learning behavior of graph neural networks (GNNs). First, we show that TMD captures properties relevant to graph classification: a simple TMD-SVM performs competitively with standard GNNs. Second, we relate TMD to generalization of GNNs under distribution shifts, and show that it correlates well with performance drop under such shifts.

* NeurIPS 2022

Via

Access Paper or Ask Questions

On the generalization of learning algorithms that do not converge

Aug 19, 2022
Nisha Chandramoorthy, Andreas Loukas, Khashayar Gatmiry, Stefanie Jegelka

Figure 1 for On the generalization of learning algorithms that do not converge

Figure 2 for On the generalization of learning algorithms that do not converge

Figure 3 for On the generalization of learning algorithms that do not converge

Figure 4 for On the generalization of learning algorithms that do not converge

Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.

* 27 pages, under review

Via

Access Paper or Ask Questions

Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Aug 08, 2022
Nikolaos Karalias, Joshua Robinson, Andreas Loukas, Stefanie Jegelka

Figure 1 for Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Figure 2 for Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Figure 3 for Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Figure 4 for Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions

Integrating functions on discrete domains into neural networks is key to developing their capability to reason about discrete objects. But, discrete domains are (1) not naturally amenable to gradient-based optimization, and (2) incompatible with deep learning architectures that rely on representations in high-dimensional vector spaces. In this work, we address both difficulties for set functions, which capture many important discrete problems. First, we develop a framework for extending set functions onto low-dimensional continuous domains, where many extensions are naturally defined. Our framework subsumes many well-known extensions as special cases. Second, to avoid undesirable low-dimensional neural network bottlenecks, we convert low-dimensional extensions into representations in high-dimensional spaces, taking inspiration from the success of semidefinite programs for combinatorial optimization. Empirically, we observe benefits of our extensions for unsupervised neural combinatorial optimization, in particular with high-dimensional representations.

Via

Access Paper or Ask Questions