Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tommi S. Jaakkola

Gromov-Wasserstein Alignment of Word Embedding Spaces

Aug 31, 2018
David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for Gromov-Wasserstein Alignment of Word Embedding Spaces

Figure 2 for Gromov-Wasserstein Alignment of Word Embedding Spaces

Figure 3 for Gromov-Wasserstein Alignment of Word Embedding Spaces

Figure 4 for Gromov-Wasserstein Alignment of Word Embedding Spaces

Cross-lingual or cross-domain correspondences play key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have become effective alignment tools. Current state-of-the-art methods, however, involve multiple steps, including heuristic post-hoc refinement strategies. In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. Indeed, we exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages. We show that our OT objective can be estimated efficiently, requires little or no tuning, and results in performance comparable with the state-of-the-art in various unsupervised word translation tasks.

* EMNLP 2018

Via

Access Paper or Ask Questions

Game-Theoretic Interpretability for Temporal Modeling

Jun 30, 2018
Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for Game-Theoretic Interpretability for Temporal Modeling

Figure 2 for Game-Theoretic Interpretability for Temporal Modeling

Figure 3 for Game-Theoretic Interpretability for Temporal Modeling

Figure 4 for Game-Theoretic Interpretability for Temporal Modeling

Interpretability has arisen as a key desideratum of machine learning models alongside performance. Approaches so far have been primarily concerned with fixed dimensional inputs emphasizing feature relevance or selection. In contrast, we focus on temporal modeling and the problem of tailoring the predictor, functionally, towards an interpretable family. To this end, we propose a co-operative game between the predictor and an explainer without any a priori restrictions on the functional class of the predictor. The goal of the explainer is to highlight, locally, how well the predictor conforms to the chosen interpretable family of temporal models. Our co-operative game is setup asymmetrically in terms of information sets for efficiency reasons. We develop and illustrate the framework in the context of temporal sequence models with examples.

Via

Access Paper or Ask Questions

Towards Optimal Transport with Global Invariances

Jun 25, 2018
David Alvarez-Melis, Stefanie Jegelka, Tommi S. Jaakkola

Figure 1 for Towards Optimal Transport with Global Invariances

Figure 2 for Towards Optimal Transport with Global Invariances

Figure 3 for Towards Optimal Transport with Global Invariances

Figure 4 for Towards Optimal Transport with Global Invariances

Many problems in machine learning involve calculating correspondences between sets of objects, such as point clouds or images. Discrete optimal transport (OT) provides a natural and successful approach to such tasks whenever the two sets of objects can be represented in the same space or when we can evaluate distances between the objects. Unfortunately neither requirement is likely to hold when object representations are learned from data. Indeed, automatically derived representations such as word embeddings are typically fixed only up to some global transformations, for example, reflection or rotation. As a result, pairwise distances across the two types of objects are ill-defined without specifying their relative transformation. In this work, we propose a general framework for optimal transport in the presence of latent global transformations. We discuss algorithms for the specific case of orthonormal transformations, and show promising results in unsupervised word alignment.

Via

Access Paper or Ask Questions

On the Robustness of Interpretability Methods

Jun 21, 2018
David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for On the Robustness of Interpretability Methods

Figure 2 for On the Robustness of Interpretability Methods

Figure 3 for On the Robustness of Interpretability Methods

Figure 4 for On the Robustness of Interpretability Methods

We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.

* presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden

Via

Access Paper or Ask Questions

Towards Robust Interpretability with Self-Explaining Neural Networks

Jun 20, 2018
David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for Towards Robust Interpretability with Self-Explaining Neural Networks

Figure 2 for Towards Robust Interpretability with Self-Explaining Neural Networks

Figure 3 for Towards Robust Interpretability with Self-Explaining Neural Networks

Figure 4 for Towards Robust Interpretability with Self-Explaining Neural Networks

Most recent work on interpretability of complex machine learning models has focused on estimating $\textit{a posteriori}$ explanations for previously trained models around specific predictions. $\textit{Self-explaining}$ models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.

Via

Access Paper or Ask Questions

Structured Optimal Transport

Dec 17, 2017
David Alvarez-Melis, Tommi S. Jaakkola, Stefanie Jegelka

Figure 1 for Structured Optimal Transport

Figure 2 for Structured Optimal Transport

Figure 3 for Structured Optimal Transport

Figure 4 for Structured Optimal Transport

Optimal Transport has recently gained interest in machine learning for applications ranging from domain adaptation, sentence similarities to deep learning. Yet, its ability to capture frequently occurring structure beyond the "ground metric" is limited. In this work, we develop a nonlinear generalization of (discrete) optimal transport that is able to reflect much additional structure. We demonstrate how to leverage the geometry of this new model for fast algorithms, and explore connections and properties. Illustrative experiments highlight the benefit of the induced structured couplings for tasks in domain adaptation and natural language processing.

Via

Access Paper or Ask Questions

A causal framework for explaining the predictions of black-box sequence-to-sequence models

Nov 14, 2017
David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for A causal framework for explaining the predictions of black-box sequence-to-sequence models

Figure 2 for A causal framework for explaining the predictions of black-box sequence-to-sequence models

Figure 3 for A causal framework for explaining the predictions of black-box sequence-to-sequence models

Figure 4 for A causal framework for explaining the predictions of black-box sequence-to-sequence models

We interpret the predictions of any black-box structured input-structured output model around a specific input-output pair. Our method returns an "explanation" consisting of groups of input-output tokens that are causally related. These dependencies are inferred by querying the black-box model with perturbed inputs, generating a graph over tokens from the responses, and solving a partitioning problem to select the most relevant components. We focus the general approach on sequence-to-sequence problems, adopting a variational autoencoder to yield meaningful input perturbations. We test our method across several NLP sequence generation tasks.

* 12 Pages, EMNLP 2017

Via

Access Paper or Ask Questions

From random walks to distances on unweighted graphs

Nov 02, 2015
Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

Figure 1 for From random walks to distances on unweighted graphs

Figure 2 for From random walks to distances on unweighted graphs

Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.

* To appear in NIPS 2015

Via

Access Paper or Ask Questions

Word, graph and manifold embedding from Markov processes

Sep 18, 2015
Tatsunori B. Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola

Figure 1 for Word, graph and manifold embedding from Markov processes

Figure 2 for Word, graph and manifold embedding from Markov processes

Figure 3 for Word, graph and manifold embedding from Markov processes

Figure 4 for Word, graph and manifold embedding from Markov processes

Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitive-psychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric recovery as the key object of study, unify existing algorithms as consistent metric recovery methods based on co-occurrence counts from simple Markov random walks, and propose a new recovery algorithm. Third, we generalize metric recovery to graphs and manifolds, relating co-occurence counts on random walks in graphs and random processes on manifolds to the underlying metric to be recovered, thereby reconciling manifold estimation and embedding algorithms. We compare embedding algorithms across a range of tasks, from nonlinear dimensionality reduction to three semantic language tasks, including analogies, sequence completion, and classification.

Via

Access Paper or Ask Questions

Metric recovery from directed unweighted graphs

Nov 20, 2014
Tatsunori B. Hashimoto, Yi Sun, Tommi S. Jaakkola

Figure 1 for Metric recovery from directed unweighted graphs

Figure 2 for Metric recovery from directed unweighted graphs

Figure 3 for Metric recovery from directed unweighted graphs

Figure 4 for Metric recovery from directed unweighted graphs

We analyze directed, unweighted graphs obtained from $x_i\in \mathbb{R}^d$ by connecting vertex $i$ to $j$ iff $|x_i - x_j| < \epsilon(x_i)$. Examples of such graphs include $k$-nearest neighbor graphs, where $\epsilon(x_i)$ varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric $\epsilon(x_i)$ and the associated density $p(x_i)$ given only the directed graph and $d$. We show that consistent recovery is possible up to isometric scaling when the vertex degree is at least $\omega(n^{2/(2+d)}\log(n)^{d/(d+2)})$. Our estimator is based on a careful characterization of a random walk over the directed graph and the associated continuum limit. As an algorithm, it resembles the PageRank centrality metric. We demonstrate empirically that the estimator performs well on simulated examples as well as on real-world co-purchasing graphs even with a small number of points and degree scaling as low as $\log(n)$.

* Poster at NIPS workshop on networks. Submitted to AISTATS 2015

Via

Access Paper or Ask Questions