Alert button
Picture for Greg Mori

Greg Mori

Alert button

RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

May 30, 2022
Yu Gong, Greg Mori, Frederick Tung

Figure 1 for RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression
Figure 2 for RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression
Figure 3 for RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression
Figure 4 for RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

Data imbalance, in which a plurality of the data samples come from a small proportion of labels, poses a challenge in training deep neural networks. Unlike classification, in regression the labels are continuous, potentially boundless, and form a natural ordering. These distinct features of regression call for new techniques that leverage the additional information encoded in label-space relationships. This paper presents the RankSim (ranking similarity) regularizer for deep imbalanced regression, which encodes an inductive bias that samples that are closer in label space should also be closer in feature space. In contrast to recent distribution smoothing based approaches, RankSim captures both nearby and distant relationships: for a given data sample, RankSim encourages the sorted list of its neighbors in label space to match the sorted list of its neighbors in feature space. RankSim is complementary to conventional imbalanced learning techniques, including re-weighting, two-stage training, and distribution smoothing, and lifts the state-of-the-art performance on three imbalanced regression benchmarks: IMDB-WIKI-DIR, AgeDB-DIR, and STS-B-DIR.

* Accepted to ICML 2022 
Viaarxiv icon

Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification

May 17, 2022
Joao Monteiro, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Greg Mori

Figure 1 for Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification
Figure 2 for Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification
Figure 3 for Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification
Figure 4 for Monotonicity Regularization: Improved Penalties and Novel Applications to Disentangled Representation Learning and Robust Classification

We study settings where gradient penalties are used alongside risk minimization with the goal of obtaining predictors satisfying different notions of monotonicity. Specifically, we present two sets of contributions. In the first part of the paper, we show that different choices of penalties define the regions of the input space where the property is observed. As such, previous methods result in models that are monotonic only in a small volume of the input space. We thus propose an approach that uses mixtures of training instances and random points to populate the space and enforce the penalty in a much larger region. As a second set of contributions, we introduce regularization strategies that enforce other notions of monotonicity in different settings. In this case, we consider applications, such as image classification and generative modeling, where monotonicity is not a hard constraint but can help improve some aspects of the model. Namely, we show that inducing monotonicity can be beneficial in applications such as: (1) allowing for controllable data generation, (2) defining strategies to detect anomalous data, and (3) generating explanations for predictions. Our proposed approaches do not introduce relevant computational overhead while leading to efficient procedures that provide extra benefits over baseline models.

* Accepted to UAI 2022 
Viaarxiv icon

Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

Feb 01, 2022
Steeven Janny, Fabien Baradel, Natalia Neverova, Madiha Nadri, Greg Mori, Christian Wolf

Figure 1 for Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space
Figure 2 for Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space
Figure 3 for Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space
Figure 4 for Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

Learning causal relationships in high-dimensional data (images, videos) is a hard task, as they are often defined on low dimensional manifolds and must be extracted from complex signals dominated by appearance, lighting, textures and also spurious correlations in the data. We present a method for learning counterfactual reasoning of physical processes in pixel space, which requires the prediction of the impact of interventions on initial conditions. Going beyond the identification of structural relationships, we deal with the challenging problem of forecasting raw video over long horizons. Our method does not require the knowledge or supervision of any ground truth positions or other object or scene properties. Our model learns and acts on a suitable hybrid latent representation based on a combination of dense features, sets of 2D keypoints and an additional latent vector per keypoint. We show that this better captures the dynamics of physical processes than purely dense or sparse representations. We introduce a new challenging and carefully designed counterfactual benchmark for predictions in pixel space and outperform strong baselines in physics-inspired ML and video prediction.

Viaarxiv icon

MUSE: Feature Self-Distillation with Mutual Information and Self-Information

Oct 25, 2021
Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen

Figure 1 for MUSE: Feature Self-Distillation with Mutual Information and Self-Information
Figure 2 for MUSE: Feature Self-Distillation with Mutual Information and Self-Information
Figure 3 for MUSE: Feature Self-Distillation with Mutual Information and Self-Information
Figure 4 for MUSE: Feature Self-Distillation with Mutual Information and Self-Information

We present a novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). The core idea of our proposed method, called MUSE, is to combine MUtual information and SElf-information to jointly improve the expressivity of all features extracted from different layers in a CNN. We present two variants of the realization of MUSE -- Additive Information and Multiplicative Information. Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework. MUSE achieves superior performance over a variety of popular architectures and feature discrepancy functions for self-distillation and online distillation, and performs competitively with the state-of-the-art methods for offline distillation. MUSE is also demonstrably versatile that enables it to be easily extended to CNN-based models on tasks other than image classification such as object detection.

* The 32nd British Machine Vision Conference (BMVC 2021) 
Viaarxiv icon

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

Aug 19, 2021
Xiang Xu, Hanbyul Joo, Greg Mori, Manolis Savva

Figure 1 for D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
Figure 2 for D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
Figure 3 for D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
Figure 4 for D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions. Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints. Each manipulated object (e.g., microwave oven) is represented with a matching 3D parametric model. This data allows us to evaluate the reconstruction quality of articulated objects and establish a benchmark for this challenging task. In particular, we leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics. We evaluate this approach on our dataset, demonstrating that human-object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos. Code and dataset are available at https://github.com/facebookresearch/d3d-hoi.

Viaarxiv icon

Continuous Latent Process Flows

Jun 29, 2021
Ruizhi Deng, Marcus A. Brubaker, Greg Mori, Andreas M. Lehrmann

Figure 1 for Continuous Latent Process Flows
Figure 2 for Continuous Latent Process Flows
Figure 3 for Continuous Latent Process Flows
Figure 4 for Continuous Latent Process Flows

Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.

Viaarxiv icon

TD-GEN: Graph Generation With Tree Decomposition

Jun 20, 2021
Hamed Shirzad, Hossein Hajimirsadeghi, Amir H. Abdi, Greg Mori

Figure 1 for TD-GEN: Graph Generation With Tree Decomposition
Figure 2 for TD-GEN: Graph Generation With Tree Decomposition
Figure 3 for TD-GEN: Graph Generation With Tree Decomposition
Figure 4 for TD-GEN: Graph Generation With Tree Decomposition

We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incrementally generated inside the clusters by traversing the tree supernodes, respecting the structure of the tree decomposition, and following node sharing decisions between the clusters. Finally, we discuss the shortcomings of standard evaluation criteria based on statistical properties of the generated graphs as performance measures. We propose to compare the performance of models based on likelihood. Empirical results on a variety of standard graph generation datasets demonstrate the superior performance of our method.

Viaarxiv icon

Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Apr 24, 2021
Mengyao Zhai, Lei Chen, Jiawei He, Megha Nawhal, Frederick Tung, Greg Mori

Figure 1 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 2 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 3 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 4 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Humans accumulate knowledge in a lifelong fashion. Modern deep neural networks, on the other hand, are susceptible to catastrophic forgetting: when adapted to perform new tasks, they often fail to preserve their performance on previously learned tasks. Given a sequence of tasks, a naive approach addressing catastrophic forgetting is to train a separate standalone model for each task, which scales the total number of parameters drastically without efficiently utilizing previous models. In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks. For the current task, our model achieves high generation quality on par with a standalone model at a lower number of parameters. For previous tasks, our model can also preserve generation quality since the filters for previous tasks are not altered. We validate Piggyback GAN on various image-conditioned generation tasks across different domains, and provide qualitative and quantitative results to show that the proposed approach can address catastrophic forgetting effectively and efficiently.

* Accepted to ECCV 2020 
Viaarxiv icon

Adaptive Appearance Rendering

Apr 24, 2021
Mengyao Zhai, Ruizhi Deng, Jiacheng Chen, Lei Chen, Zhiwei Deng, Greg Mori

Figure 1 for Adaptive Appearance Rendering
Figure 2 for Adaptive Appearance Rendering
Figure 3 for Adaptive Appearance Rendering
Figure 4 for Adaptive Appearance Rendering

We propose an approach to generate images of people given a desired appearance and pose. Disentangled representations of pose and appearance are necessary to handle the compound variability in the resulting generated images. Hence, we develop an approach based on intermediate representations of poses and appearance: our pose-guided appearance rendering network firstly encodes the targets' poses using an encoder-decoder neural network. Then the targets' appearances are encoded by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in the encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate images and videos that are superior to state-of-the-art methods, and can handle pose guided appearance rendering in both image and video generation.

* Accepted to BMVC 2018. arXiv admin note: substantial text overlap with arXiv:1712.01955 
Viaarxiv icon

Learning Discriminative Prototypes with Dynamic Time Warping

Mar 17, 2021
Xiaobin Chang, Frederick Tung, Greg Mori

Figure 1 for Learning Discriminative Prototypes with Dynamic Time Warping
Figure 2 for Learning Discriminative Prototypes with Dynamic Time Warping
Figure 3 for Learning Discriminative Prototypes with Dynamic Time Warping
Figure 4 for Learning Discriminative Prototypes with Dynamic Time Warping

Dynamic Time Warping (DTW) is widely used for temporal data processing. However, existing methods can neither learn the discriminative prototypes of different classes nor exploit such prototypes for further analysis. We propose Discriminative Prototype DTW (DP-DTW), a novel method to learn class-specific discriminative prototypes for temporal recognition tasks. DP-DTW shows superior performance compared to conventional DTWs on time series classification benchmarks. Combined with end-to-end deep learning, DP-DTW can handle challenging weakly supervised action segmentation problems and achieves state of the art results on standard benchmarks. Moreover, detailed reasoning on the input video is enabled by the learned action prototypes. Specifically, an action-based video summarization can be obtained by aligning the input sequence with action prototypes.

* CVPR'21 preview, 10 pages, 8 figures 
Viaarxiv icon