Alert button
Picture for Mengyao Zhai

Mengyao Zhai

Alert button

Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Mar 31, 2023
Mohammadi Kiarash, Zhao He, Mengyao Zhai, Frederick Tung

Figure 1 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
Figure 2 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
Figure 3 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate
Figure 4 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

In many real-world settings, the critical class is rare and a missed detection carries a disproportionately high cost. For example, tumors are rare and a false negative diagnosis could have severe consequences on treatment outcomes; fraudulent banking transactions are rare and an undetected occurrence could result in significant losses or legal penalties. In such contexts, systems are often operated at a high true positive rate, which may require tolerating high false positives. In this paper, we present a novel approach to address the challenge of minimizing false positives for systems that need to operate at a high true positive rate. We propose a ranking-based regularization (RankReg) approach that is easy to implement, and show empirically that it not only effectively reduces false positives, but also complements conventional imbalanced learning losses. With this novel technique in hand, we conduct a series of experiments on three broadly explored datasets (CIFAR-10&100 and Melanoma) and show that our approach lifts the previous state-of-the-art performance by notable margins.

Viaarxiv icon

Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Apr 24, 2021
Mengyao Zhai, Lei Chen, Jiawei He, Megha Nawhal, Frederick Tung, Greg Mori

Figure 1 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 2 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 3 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
Figure 4 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Humans accumulate knowledge in a lifelong fashion. Modern deep neural networks, on the other hand, are susceptible to catastrophic forgetting: when adapted to perform new tasks, they often fail to preserve their performance on previously learned tasks. Given a sequence of tasks, a naive approach addressing catastrophic forgetting is to train a separate standalone model for each task, which scales the total number of parameters drastically without efficiently utilizing previous models. In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks. For the current task, our model achieves high generation quality on par with a standalone model at a lower number of parameters. For previous tasks, our model can also preserve generation quality since the filters for previous tasks are not altered. We validate Piggyback GAN on various image-conditioned generation tasks across different domains, and provide qualitative and quantitative results to show that the proposed approach can address catastrophic forgetting effectively and efficiently.

* Accepted to ECCV 2020 
Viaarxiv icon

Adaptive Appearance Rendering

Apr 24, 2021
Mengyao Zhai, Ruizhi Deng, Jiacheng Chen, Lei Chen, Zhiwei Deng, Greg Mori

Figure 1 for Adaptive Appearance Rendering
Figure 2 for Adaptive Appearance Rendering
Figure 3 for Adaptive Appearance Rendering
Figure 4 for Adaptive Appearance Rendering

We propose an approach to generate images of people given a desired appearance and pose. Disentangled representations of pose and appearance are necessary to handle the compound variability in the resulting generated images. Hence, we develop an approach based on intermediate representations of poses and appearance: our pose-guided appearance rendering network firstly encodes the targets' poses using an encoder-decoder neural network. Then the targets' appearances are encoded by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in the encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate images and videos that are superior to state-of-the-art methods, and can handle pose guided appearance rendering in both image and video generation.

* Accepted to BMVC 2018. arXiv admin note: substantial text overlap with arXiv:1712.01955 
Viaarxiv icon

Zero-Shot Generation of Human-Object Interaction Videos

Dec 09, 2019
Megha Nawhal, Mengyao Zhai, Andreas Lehrmann, Leonid Sigal

Figure 1 for Zero-Shot Generation of Human-Object Interaction Videos
Figure 2 for Zero-Shot Generation of Human-Object Interaction Videos
Figure 3 for Zero-Shot Generation of Human-Object Interaction Videos
Figure 4 for Zero-Shot Generation of Human-Object Interaction Videos

Generation of videos of complex scenes is an important open problem in computer vision research. Human activity videos are a good example of such complex scenes. Human activities are typically formed as compositions of actions applied to objects -- modeling interactions between people and the physical world are a core part of visual understanding. In this paper, we introduce the task of generating human-object interaction videos in a zero-shot compositional setting, i.e., generating videos for action-object compositions that are unseen during training, having seen the target action and target object independently. To generate human-object interaction videos, we propose a novel adversarial framework HOI-GAN which includes multiple discriminators focusing on different aspects of a video. To demonstrate the effectiveness of our proposed framework, we perform extensive quantitative and qualitative evaluation on two challenging datasets: EPIC-Kitchens and 20BN-Something-Something v2.

* Project Page: https://www.sfu.ca/~mnawhal/projects/zs_hoi_generation.html 
Viaarxiv icon

Lifelong GAN: Continual Learning for Conditional Image Generation

Aug 22, 2019
Mengyao Zhai, Lei Chen, Fred Tung, Jiawei He, Megha Nawhal, Greg Mori

Figure 1 for Lifelong GAN: Continual Learning for Conditional Image Generation
Figure 2 for Lifelong GAN: Continual Learning for Conditional Image Generation
Figure 3 for Lifelong GAN: Continual Learning for Conditional Image Generation
Figure 4 for Lifelong GAN: Continual Learning for Conditional Image Generation

Lifelong learning is challenging for deep neural networks due to their susceptibility to catastrophic forgetting. Catastrophic forgetting occurs when a trained network is not able to maintain its ability to accomplish previously learned tasks when it is trained to perform new tasks. We study the problem of lifelong learning for generative models, extending a trained network to new conditional generation tasks without forgetting previous tasks, while assuming access to the training data for the current task only. In contrast to state-of-the-art memory replay based approaches which are limited to label-conditioned image generation tasks, a more generic framework for continual learning of generative models under different conditional image generation settings is proposed in this paper. Lifelong GAN employs knowledge distillation to transfer learned knowledge from previous networks to the new network. This makes it possible to perform image-conditioned generation tasks in a lifelong learning setting. We validate Lifelong GAN for both image-conditioned and label-conditioned generation tasks, and provide qualitative and quantitative results to show the generality and effectiveness of our method.

* accepted to ICCV 2019 
Viaarxiv icon

Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Dec 05, 2017
Mengyao Zhai, Jiacheng Chen, Ruizhi Deng, Lei Chen, Ligeng Zhu, Greg Mori

Figure 1 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering
Figure 2 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering
Figure 3 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering
Figure 4 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

We propose an approach for forecasting video of complex human activity involving multiple people. Direct pixel-level prediction is too simple to handle the appearance variability in complex activities. Hence, we develop novel intermediate representations. An architecture combining a hierarchical temporal model for predicting human poses and encoder-decoder convolutional neural networks for rendering target appearances is proposed. Our hierarchical model captures interactions among people by adopting a dynamic group-based interaction mechanism. Next, our appearance rendering network encodes the targets' appearances by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate videos that are superior to state-of-the-art methods, and can handle complex human activity scenarios in video forecasting.

Viaarxiv icon

Deep Learning of Appearance Models for Online Object Tracking

Jul 09, 2016
Mengyao Zhai, Mehrsan Javan Roshtkhari, Greg Mori

Figure 1 for Deep Learning of Appearance Models for Online Object Tracking
Figure 2 for Deep Learning of Appearance Models for Online Object Tracking
Figure 3 for Deep Learning of Appearance Models for Online Object Tracking
Figure 4 for Deep Learning of Appearance Models for Online Object Tracking

This paper introduces a novel deep learning based approach for vision based single target tracking. We address this problem by proposing a network architecture which takes the input video frames and directly computes the tracking score for any candidate target location by estimating the probability distributions of the positive and negative examples. This is achieved by combining a deep convolutional neural network with a Bayesian loss layer in a unified framework. In order to deal with the limited number of positive training examples, the network is pre-trained offline for a generic image feature representation and then is fine-tuned in multiple steps. An online fine-tuning step is carried out at every frame to learn the appearance of the target. We adopt a two-stage iterative algorithm to adaptively update the network parameters and maintain a probability density for target/non-target regions. The tracker has been tested on the standard tracking benchmark and the results indicate that the proposed solution achieves state-of-the-art tracking results.

Viaarxiv icon

Deep Structured Models For Group Activity Recognition

Jun 12, 2015
Zhiwei Deng, Mengyao Zhai, Lei Chen, Yuhao Liu, Srikanth Muralidharan, Mehrsan Javan Roshtkhari, Greg Mori

Figure 1 for Deep Structured Models For Group Activity Recognition
Figure 2 for Deep Structured Models For Group Activity Recognition
Figure 3 for Deep Structured Models For Group Activity Recognition
Figure 4 for Deep Structured Models For Group Activity Recognition

This paper presents a deep neural-network-based hierarchical graphical model for individual and group activity recognition in surveillance scenes. Deep networks are used to recognize the actions of individual people in a scene. Next, a neural-network-based hierarchical graphical model refines the predicted labels for each class by considering dependencies between the classes. This refinement step mimics a message-passing step similar to inference in a probabilistic graphical model. We show that this approach can be effective in group activity recognition, with the deep graphical model improving recognition rates over baseline methods.

Viaarxiv icon