Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengyao Zhai

Radar: Fast Long-Context Decoding for Any Transformer

Mar 13, 2025

Yongchang Hao, Mengyao Zhai, Hossein Hajimirsadeghi, Sepidehsadat Hosseini, Frederick Tung

Abstract:Transformer models have demonstrated exceptional performance across a wide range of applications. Though forming the foundation of Transformer models, the dot-product attention does not scale well to long-context data since its time requirement grows quadratically with context length. In this work, we propose Radar, a training-free approach that accelerates inference by dynamically searching for the most important context tokens. For any pre-trained Transformer, Radar can reduce the decoding time complexity without training or heuristically evicting tokens. Moreover, we provide theoretical justification for our approach, demonstrating that Radar can reliably identify the most important tokens with high probability. We conduct extensive comparisons with the previous methods on a wide range of tasks. The results demonstrate that Radar achieves the state-of-the-art performance across different architectures with reduced time complexity, offering a practical solution for efficient long-context processing of Transformers.

* Accepted @ ICLR 2025

Via

Access Paper or Ask Questions

Prompting-based Efficient Temporal Domain Generalization

Oct 03, 2023

Sepidehsadat Hosseini, Mengyao Zhai, Hossein Hajimirsadegh, Frederick Tung

Figure 1 for Prompting-based Efficient Temporal Domain Generalization

Figure 2 for Prompting-based Efficient Temporal Domain Generalization

Figure 3 for Prompting-based Efficient Temporal Domain Generalization

Figure 4 for Prompting-based Efficient Temporal Domain Generalization

Abstract:Machine learning traditionally assumes that training and testing data are distributed independently and identically. However, in many real-world settings, the data distribution can shift over time, leading to poor generalization of trained models in future time periods. Our paper presents a novel prompting-based approach to temporal domain generalization that is parameter-efficient, time-efficient, and does not require access to the target domain data (i.e., unseen future time periods) during training. Our method adapts a target pre-trained model to temporal drift by learning global prompts, domain-specific prompts, and drift-aware prompts that capture underlying temporal dynamics. It is compatible across diverse tasks, such as classification, regression, and time series forecasting, and sets a new state-of-the-art benchmark in temporal domain generalization. The code repository will be publicly shared.

Via

Access Paper or Ask Questions

Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Mar 31, 2023

Mohammadi Kiarash, Zhao He, Mengyao Zhai, Frederick Tung

Figure 1 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Figure 2 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Figure 3 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Figure 4 for Ranking Regularization for Critical Rare Classes: Minimizing False Positives at a High True Positive Rate

Abstract:In many real-world settings, the critical class is rare and a missed detection carries a disproportionately high cost. For example, tumors are rare and a false negative diagnosis could have severe consequences on treatment outcomes; fraudulent banking transactions are rare and an undetected occurrence could result in significant losses or legal penalties. In such contexts, systems are often operated at a high true positive rate, which may require tolerating high false positives. In this paper, we present a novel approach to address the challenge of minimizing false positives for systems that need to operate at a high true positive rate. We propose a ranking-based regularization (RankReg) approach that is easy to implement, and show empirically that it not only effectively reduces false positives, but also complements conventional imbalanced learning losses. With this novel technique in hand, we conduct a series of experiments on three broadly explored datasets (CIFAR-10&100 and Melanoma) and show that our approach lifts the previous state-of-the-art performance by notable margins.

Via

Access Paper or Ask Questions

Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Apr 24, 2021

Mengyao Zhai, Lei Chen, Jiawei He, Megha Nawhal, Frederick Tung, Greg Mori

Figure 1 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Figure 2 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Figure 3 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Figure 4 for Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation

Abstract:Humans accumulate knowledge in a lifelong fashion. Modern deep neural networks, on the other hand, are susceptible to catastrophic forgetting: when adapted to perform new tasks, they often fail to preserve their performance on previously learned tasks. Given a sequence of tasks, a naive approach addressing catastrophic forgetting is to train a separate standalone model for each task, which scales the total number of parameters drastically without efficiently utilizing previous models. In contrast, we propose a parameter efficient framework, Piggyback GAN, which learns the current task by building a set of convolutional and deconvolutional filters that are factorized into filters of the models trained on previous tasks. For the current task, our model achieves high generation quality on par with a standalone model at a lower number of parameters. For previous tasks, our model can also preserve generation quality since the filters for previous tasks are not altered. We validate Piggyback GAN on various image-conditioned generation tasks across different domains, and provide qualitative and quantitative results to show that the proposed approach can address catastrophic forgetting effectively and efficiently.

* Accepted to ECCV 2020

Via

Access Paper or Ask Questions

Adaptive Appearance Rendering

Apr 24, 2021

Mengyao Zhai, Ruizhi Deng, Jiacheng Chen, Lei Chen, Zhiwei Deng, Greg Mori

Figure 1 for Adaptive Appearance Rendering

Figure 2 for Adaptive Appearance Rendering

Figure 3 for Adaptive Appearance Rendering

Figure 4 for Adaptive Appearance Rendering

Abstract:We propose an approach to generate images of people given a desired appearance and pose. Disentangled representations of pose and appearance are necessary to handle the compound variability in the resulting generated images. Hence, we develop an approach based on intermediate representations of poses and appearance: our pose-guided appearance rendering network firstly encodes the targets' poses using an encoder-decoder neural network. Then the targets' appearances are encoded by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in the encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate images and videos that are superior to state-of-the-art methods, and can handle pose guided appearance rendering in both image and video generation.

* Accepted to BMVC 2018. arXiv admin note: substantial text overlap with arXiv:1712.01955

Via

Access Paper or Ask Questions

Zero-Shot Generation of Human-Object Interaction Videos

Dec 09, 2019

Megha Nawhal, Mengyao Zhai, Andreas Lehrmann, Leonid Sigal

Figure 1 for Zero-Shot Generation of Human-Object Interaction Videos

Figure 2 for Zero-Shot Generation of Human-Object Interaction Videos

Figure 3 for Zero-Shot Generation of Human-Object Interaction Videos

Figure 4 for Zero-Shot Generation of Human-Object Interaction Videos

Abstract:Generation of videos of complex scenes is an important open problem in computer vision research. Human activity videos are a good example of such complex scenes. Human activities are typically formed as compositions of actions applied to objects -- modeling interactions between people and the physical world are a core part of visual understanding. In this paper, we introduce the task of generating human-object interaction videos in a zero-shot compositional setting, i.e., generating videos for action-object compositions that are unseen during training, having seen the target action and target object independently. To generate human-object interaction videos, we propose a novel adversarial framework HOI-GAN which includes multiple discriminators focusing on different aspects of a video. To demonstrate the effectiveness of our proposed framework, we perform extensive quantitative and qualitative evaluation on two challenging datasets: EPIC-Kitchens and 20BN-Something-Something v2.

* Project Page: https://www.sfu.ca/~mnawhal/projects/zs_hoi_generation.html

Via

Access Paper or Ask Questions

Lifelong GAN: Continual Learning for Conditional Image Generation

Aug 22, 2019

Mengyao Zhai, Lei Chen, Fred Tung, Jiawei He, Megha Nawhal, Greg Mori

Figure 1 for Lifelong GAN: Continual Learning for Conditional Image Generation

Figure 2 for Lifelong GAN: Continual Learning for Conditional Image Generation

Figure 3 for Lifelong GAN: Continual Learning for Conditional Image Generation

Figure 4 for Lifelong GAN: Continual Learning for Conditional Image Generation

Abstract:Lifelong learning is challenging for deep neural networks due to their susceptibility to catastrophic forgetting. Catastrophic forgetting occurs when a trained network is not able to maintain its ability to accomplish previously learned tasks when it is trained to perform new tasks. We study the problem of lifelong learning for generative models, extending a trained network to new conditional generation tasks without forgetting previous tasks, while assuming access to the training data for the current task only. In contrast to state-of-the-art memory replay based approaches which are limited to label-conditioned image generation tasks, a more generic framework for continual learning of generative models under different conditional image generation settings is proposed in this paper. Lifelong GAN employs knowledge distillation to transfer learned knowledge from previous networks to the new network. This makes it possible to perform image-conditioned generation tasks in a lifelong learning setting. We validate Lifelong GAN for both image-conditioned and label-conditioned generation tasks, and provide qualitative and quantitative results to show the generality and effectiveness of our method.

* accepted to ICCV 2019

Via

Access Paper or Ask Questions

Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Dec 05, 2017

Mengyao Zhai, Jiacheng Chen, Ruizhi Deng, Lei Chen, Ligeng Zhu, Greg Mori

Figure 1 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Figure 2 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Figure 3 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Figure 4 for Learning to Forecast Videos of Human Activity with Multi-granularity Models and Adaptive Rendering

Abstract:We propose an approach for forecasting video of complex human activity involving multiple people. Direct pixel-level prediction is too simple to handle the appearance variability in complex activities. Hence, we develop novel intermediate representations. An architecture combining a hierarchical temporal model for predicting human poses and encoder-decoder convolutional neural networks for rendering target appearances is proposed. Our hierarchical model captures interactions among people by adopting a dynamic group-based interaction mechanism. Next, our appearance rendering network encodes the targets' appearances by learning adaptive appearance filters using a fully convolutional network. Finally, these filters are placed in encoder-decoder neural networks to complete the rendering. We demonstrate that our model can generate videos that are superior to state-of-the-art methods, and can handle complex human activity scenarios in video forecasting.

Via

Access Paper or Ask Questions

Deep Learning of Appearance Models for Online Object Tracking

Jul 09, 2016

Mengyao Zhai, Mehrsan Javan Roshtkhari, Greg Mori

Figure 1 for Deep Learning of Appearance Models for Online Object Tracking

Figure 2 for Deep Learning of Appearance Models for Online Object Tracking

Figure 3 for Deep Learning of Appearance Models for Online Object Tracking

Figure 4 for Deep Learning of Appearance Models for Online Object Tracking

Abstract:This paper introduces a novel deep learning based approach for vision based single target tracking. We address this problem by proposing a network architecture which takes the input video frames and directly computes the tracking score for any candidate target location by estimating the probability distributions of the positive and negative examples. This is achieved by combining a deep convolutional neural network with a Bayesian loss layer in a unified framework. In order to deal with the limited number of positive training examples, the network is pre-trained offline for a generic image feature representation and then is fine-tuned in multiple steps. An online fine-tuning step is carried out at every frame to learn the appearance of the target. We adopt a two-stage iterative algorithm to adaptively update the network parameters and maintain a probability density for target/non-target regions. The tracker has been tested on the standard tracking benchmark and the results indicate that the proposed solution achieves state-of-the-art tracking results.

Via

Access Paper or Ask Questions

Deep Structured Models For Group Activity Recognition

Jun 12, 2015

Zhiwei Deng, Mengyao Zhai, Lei Chen, Yuhao Liu, Srikanth Muralidharan, Mehrsan Javan Roshtkhari, Greg Mori

Figure 1 for Deep Structured Models For Group Activity Recognition

Figure 2 for Deep Structured Models For Group Activity Recognition

Figure 3 for Deep Structured Models For Group Activity Recognition

Figure 4 for Deep Structured Models For Group Activity Recognition

Abstract:This paper presents a deep neural-network-based hierarchical graphical model for individual and group activity recognition in surveillance scenes. Deep networks are used to recognize the actions of individual people in a scene. Next, a neural-network-based hierarchical graphical model refines the predicted labels for each class by considering dependencies between the classes. This refinement step mimics a message-passing step similar to inference in a probabilistic graphical model. We show that this approach can be effective in group activity recognition, with the deep graphical model improving recognition rates over baseline methods.

Via

Access Paper or Ask Questions