Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hossein Rahmani

A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image

Apr 27, 2023

Zheheng Jiang, Hossein Rahmani, Sue Black, Bryan M. Williams

Abstract:Recently, deep learning based approaches have shown promising results in 3D hand reconstruction from a single RGB image. These approaches can be roughly divided into model-based approaches, which are heavily dependent on the model's parameter space, and model-free approaches, which require large numbers of 3D ground truths to reduce depth ambiguity and struggle in weakly-supervised scenarios. To overcome these issues, we propose a novel probabilistic model to achieve the robustness of model-based approaches and reduced dependence on the model's parameter space of model-free approaches. The proposed probabilistic model incorporates a model-based network as a prior-net to estimate the prior probability distribution of joints and vertices. An Attention-based Mesh Vertices Uncertainty Regression (AMVUR) model is proposed to capture dependencies among vertices and the correlation between joints and mesh vertices to improve their feature representation. We further propose a learning based occlusion-aware Hand Texture Regression model to achieve high-fidelity texture reconstruction. We demonstrate the flexibility of the proposed probabilistic model to be trained in both supervised and weakly-supervised scenarios. The experimental results demonstrate our probabilistic model's state-of-the-art accuracy in 3D hand and texture reconstruction from a single image in both training schemes, including in the presence of severe occlusions.

Via

Access Paper or Ask Questions

Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Apr 12, 2023

Tianjiao Li, Lin Geng Foo, Ping Hu, Xindi Shang, Hossein Rahmani, Zehuan Yuan, Jun Liu

Figure 1 for Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Figure 2 for Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Figure 3 for Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Figure 4 for Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

Abstract:Learning with large-scale unlabeled data has become a powerful tool for pre-training Visual Transformers (VTs). However, prior works tend to overlook that, in real-world scenarios, the input data may be corrupted and unreliable. Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked ``ground truth" targets can potentially be unreliable in this case. To address this limitation, we introduce the Token Boosting Module (TBM) as a plug-and-play component for VTs that effectively allows the VT to learn to extract clean and robust features during masked autoencoding pre-training. We provide theoretical analysis to show how TBM improves model pre-training with more robust and generalizable representations, thus benefiting downstream tasks. We conduct extensive experiments to analyze TBM's effectiveness, and results on four corrupted datasets demonstrate that TBM consistently improves performance on downstream tasks.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

GradMDM: Adversarial Attack on Dynamic Networks

Apr 01, 2023

Jianhong Pan, Lin Geng Foo, Qichen Zheng, Zhipeng Fan, Hossein Rahmani, Qiuhong Ke, Jun Liu

Abstract:Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations.

* Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Via

Access Paper or Ask Questions

Progressive Channel-Shrinking Network

Apr 01, 2023

Jianhong Pan, Siyuan Yang, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Zhipeng Fan, Jun Liu

Figure 1 for Progressive Channel-Shrinking Network

Figure 2 for Progressive Channel-Shrinking Network

Figure 3 for Progressive Channel-Shrinking Network

Figure 4 for Progressive Channel-Shrinking Network

Abstract:Currently, salience-based channel pruning makes continuous breakthroughs in network compression. In the realization, the salience mechanism is used as a metric of channel salience to guide pruning. Therefore, salience-based channel pruning can dynamically adjust the channel width at run-time, which provides a flexible pruning scheme. However, there are two problems emerging: a gating function is often needed to truncate the specific salience entries to zero, which destabilizes the forward propagation; dynamic architecture brings more cost for indexing in inference which bottlenecks the inference speed. In this paper, we propose a Progressive Channel-Shrinking (PCS) method to compress the selected salience entries at run-time instead of roughly approximating them to zero. We also propose a Running Shrinking Policy to provide a testing-static pruning scheme that can reduce the memory access cost for filter indexing. We evaluate our method on ImageNet and CIFAR10 datasets over two prevalent networks: ResNet and VGG, and demonstrate that our PCS outperforms all baselines and achieves state-of-the-art in terms of compression-performance tradeoff. Moreover, we observe a significant and practical acceleration of inference.

Via

Access Paper or Ask Questions

DiffPose: Toward More Reliable 3D Pose Estimation

Nov 30, 2022

Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, Jun Liu

Abstract:Monocular 3D human pose estimation is quite challenging due to the inherent ambiguity and occlusion, which often lead to high uncertainty and indeterminacy. On the other hand, diffusion models have recently emerged as an effective tool for generating high-quality images from noise. Inspired by their capability, we explore a novel pose estimation framework (DiffPose) that formulates 3D pose estimation as a reverse diffusion process. We incorporate novel designs into our DiffPose that facilitate the diffusion process for 3D pose estimation: a pose-specific initialization of pose uncertainty distributions, a Gaussian Mixture Model-based forward diffusion process, and a context-conditioned reverse diffusion process. Our proposed DiffPose significantly outperforms existing methods on the widely used pose estimation benchmarks Human3.6M and MPI-INF-3DHP.

Via

Access Paper or Ask Questions

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Sep 03, 2022

Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang, Jinghua Wang, Jun Liu

Figure 1 for Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Figure 2 for Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Figure 3 for Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Figure 4 for Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Abstract:The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets.

* Accepted to ECCV 2022

Via

Access Paper or Ask Questions

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Jul 25, 2022

Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu

Figure 1 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 2 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 3 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 4 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Abstract:Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.

* Accepted by ECCV 2022

Via

Access Paper or Ask Questions

ERA: Expert Retrieval and Assembly for Early Action Prediction

Jul 22, 2022

Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Qiuhong Ke, Jun Liu

Figure 1 for ERA: Expert Retrieval and Assembly for Early Action Prediction

Figure 2 for ERA: Expert Retrieval and Assembly for Early Action Prediction

Figure 3 for ERA: Expert Retrieval and Assembly for Early Action Prediction

Figure 4 for ERA: Expert Retrieval and Assembly for Early Action Prediction

Abstract:Early action prediction aims to successfully predict the class label of an action before it is completely performed. This is a challenging task because the beginning stages of different actions can be very similar, with only minor subtle differences for discrimination. In this paper, we propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts most specialized at using discriminative subtle differences, to distinguish an input sample from other highly similar samples. To encourage our model to effectively use subtle differences for early action prediction, we push experts to discriminate exclusively between samples that are highly similar, forcing these experts to learn to use subtle differences that exist between those samples. Additionally, we design an effective Expert Learning Rate Optimization method that balances the experts' optimization and leads to better performance. We evaluate our ERA module on four public action datasets and achieve state-of-the-art performance.

* Accepted to ECCV 2022

Via

Access Paper or Ask Questions

Recent Advances of Continual Learning in Computer Vision: An Overview

Sep 24, 2021

Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, Jun Liu

Figure 1 for Recent Advances of Continual Learning in Computer Vision: An Overview

Figure 2 for Recent Advances of Continual Learning in Computer Vision: An Overview

Figure 3 for Recent Advances of Continual Learning in Computer Vision: An Overview

Figure 4 for Recent Advances of Continual Learning in Computer Vision: An Overview

Abstract:In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing, and accumulating new knowledge coming at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularization, knowledge distillation, memory, generative replay, parameter isolation, and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.

* 21 pages, 5 figures

Via

Access Paper or Ask Questions

The Multi-Modal Video Reasoning and Analyzing Competition

Aug 18, 2021

Haoran Peng, He Huang, Li Xu, Tianjiao Li, Jun Liu, Hossein Rahmani, Qiuhong Ke, Zhicheng Guo, Cong Wu, Rongchang Li(+8 more)

Figure 1 for The Multi-Modal Video Reasoning and Analyzing Competition

Figure 2 for The Multi-Modal Video Reasoning and Analyzing Competition

Figure 3 for The Multi-Modal Video Reasoning and Analyzing Competition

Figure 4 for The Multi-Modal Video Reasoning and Analyzing Competition

Abstract:In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition.

* Accepted to ICCV 2021 Workshops

Via

Access Paper or Ask Questions