Alert button
Picture for Haibin Ling

Haibin Ling

Alert button

The Cascaded Forward Algorithm for Neural Network Training

Mar 24, 2023
Gongpei Zhao, Tao Wang, Yidong Li, Yi Jin, Congyan Lang, Haibin Ling

Figure 1 for The Cascaded Forward Algorithm for Neural Network Training
Figure 2 for The Cascaded Forward Algorithm for Neural Network Training
Figure 3 for The Cascaded Forward Algorithm for Neural Network Training
Figure 4 for The Cascaded Forward Algorithm for Neural Network Training

Backpropagation algorithm has been widely used as a mainstream learning procedure for neural networks in the past decade, and has played a significant role in the development of deep learning. However, there exist some limitations associated with this algorithm, such as getting stuck in local minima and experiencing vanishing/exploding gradients, which have led to questions about its biological plausibility. To address these limitations, alternative algorithms to backpropagation have been preliminarily explored, with the Forward-Forward (FF) algorithm being one of the most well-known. In this paper we propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples and thus leads to a more efficient process at both training and testing. Moreover, in our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems. The proposed method is evaluated on four public image classification benchmarks, and the experimental results illustrate significant improvement in prediction accuracy in comparison with the baseline.

Viaarxiv icon

The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

Mar 21, 2023
Caixia Zhou, Yaping Huang, Mengyang Pu, Qingji Guan, Li Huang, Haibin Ling

Figure 1 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 2 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 3 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 4 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

Deep learning-based edge detectors heavily rely on pixel-wise labels which are often provided by multiple annotators. Existing methods fuse multiple annotations using a simple voting process, ignoring the inherent ambiguity of edges and labeling bias of annotators. In this paper, we propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations. Specifically, we first convert the deterministic label space into a learnable Gaussian distribution, whose variance measures the degree of ambiguity among different annotations. Then we regard the learned variance as the estimated uncertainty of the predicted edge maps, and pixels with higher uncertainty are likely to be hard samples for edge detection. Therefore we design an adaptive weighting loss to emphasize the learning from those pixels with high uncertainty, which helps the network to gradually concentrate on the important pixels. UAED can be combined with various encoder-decoder backbones, and the extensive experiments demonstrate that UAED achieves superior performance consistently across multiple edge detection benchmarks. The source code is available at \url{https://github.com/ZhouCX117/UAED}

* CVPR2023 
Viaarxiv icon

CCTV-Gun: Benchmarking Handgun Detection in CCTV Images

Mar 19, 2023
Srikar Yellapragada, Zhenghong Li, Kevin Bhadresh Doshi, Purva Makarand Mhasakar, Heng Fan, Jie Wei, Erik Blasch, Haibin Ling

Figure 1 for CCTV-Gun: Benchmarking Handgun Detection in CCTV Images
Figure 2 for CCTV-Gun: Benchmarking Handgun Detection in CCTV Images
Figure 3 for CCTV-Gun: Benchmarking Handgun Detection in CCTV Images
Figure 4 for CCTV-Gun: Benchmarking Handgun Detection in CCTV Images

Gun violence is a critical security problem, and it is imperative for the computer vision community to develop effective gun detection algorithms for real-world scenarios, particularly in Closed Circuit Television (CCTV) surveillance data. Despite significant progress in visual object detection, detecting guns in real-world CCTV images remains a challenging and under-explored task. Firearms, especially handguns, are typically very small in size, non-salient in appearance, and often severely occluded or indistinguishable from other small objects. Additionally, the lack of principled benchmarks and difficulty collecting relevant datasets further hinder algorithmic development. In this paper, we present a meticulously crafted and annotated benchmark, called \textbf{CCTV-Gun}, which addresses the challenges of detecting handguns in real-world CCTV images. Our contribution is three-fold. Firstly, we carefully select and analyze real-world CCTV images from three datasets, manually annotate handguns and their holders, and assign each image with relevant challenge factors such as blur and occlusion. Secondly, we propose a new cross-dataset evaluation protocol in addition to the standard intra-dataset protocol, which is vital for gun detection in practical settings. Finally, we comprehensively evaluate both classical and state-of-the-art object detection algorithms, providing an in-depth analysis of their generalizing abilities. The benchmark will facilitate further research and development on this topic and ultimately enhance security. Code, annotations, and trained models are available at https://github.com/srikarym/CCTV-Gun.

Viaarxiv icon

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

Jan 01, 2023
Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling

Figure 1 for Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
Figure 2 for Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
Figure 3 for Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment
Figure 4 for Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.

Viaarxiv icon

Backdoor Cleansing with Unlabeled Data

Nov 23, 2022
Lu Pang, Tao Sun, Haibin Ling, Chao Chen

Figure 1 for Backdoor Cleansing with Unlabeled Data
Figure 2 for Backdoor Cleansing with Unlabeled Data
Figure 3 for Backdoor Cleansing with Unlabeled Data
Figure 4 for Backdoor Cleansing with Unlabeled Data

Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical.

Viaarxiv icon

Local Context-Aware Active Domain Adaptation

Aug 26, 2022
Tao Sun, Cheng Lu, Haibin Ling

Figure 1 for Local Context-Aware Active Domain Adaptation
Figure 2 for Local Context-Aware Active Domain Adaptation
Figure 3 for Local Context-Aware Active Domain Adaptation
Figure 4 for Local Context-Aware Active Domain Adaptation

Active Domain Adaptation (ADA) queries the label of selected target samples to help adapting a model from a related source domain to a target domain. It has attracted increasing attention recently due to its promising performance with minimal labeling cost. Nevertheless, existing ADA methods have not fully exploited the local context of queried data, which is important to ADA, especially when the domain gap is large. In this paper, we propose a novel framework of Local context-aware Active Domain Adaptation (LADA), which is composed of two key modules. The Local context-aware Active Selection (LAS) module selects target samples whose class probability predictions are inconsistent with their neighbors. The Local context-aware Model Adaptation (LMA) module refines a model with both queried samples and their expanded neighbors, regularized by a context-preserving loss. Extensive experiments show that LAS selects more informative samples than existing active selection strategies. Furthermore, equipped with LMA, the full LADA method outperforms state-of-the-art ADA solutions on various benchmarks. Code is available at https://github.com/tsun/LADA.

Viaarxiv icon

Domain Adaptation with Adversarial Training on Penultimate Activations

Aug 26, 2022
Tao Sun, Cheng Lu, Haibin Ling

Figure 1 for Domain Adaptation with Adversarial Training on Penultimate Activations
Figure 2 for Domain Adaptation with Adversarial Training on Penultimate Activations
Figure 3 for Domain Adaptation with Adversarial Training on Penultimate Activations
Figure 4 for Domain Adaptation with Adversarial Training on Penultimate Activations

Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA). In this paper, we explore adversarial training on penultimate activations, ie, input features of the final linear classification layer. We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features, as used in previous works. Furthermore, with activation normalization commonly used in domain adaptation to reduce domain gap, we derive two variants and systematically analyze the effects of normalization on our adversarial training. This is illustrated both in theory and through empirical analysis on real adaptation tasks. Extensive experiments are conducted on popular UDA benchmarks under both standard setting and source-data free setting. The results validate that our method achieves the best scores against previous arts.

Viaarxiv icon

Attention Hijacking in Trojan Transformers

Aug 09, 2022
Weimin Lyu, Songzhu Zheng, Tengfei Ma, Haibin Ling, Chao Chen

Figure 1 for Attention Hijacking in Trojan Transformers
Figure 2 for Attention Hijacking in Trojan Transformers
Figure 3 for Attention Hijacking in Trojan Transformers
Figure 4 for Attention Hijacking in Trojan Transformers

Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity and the self-attentions are now indisputable. This raises a central question: Can we reveal the Trojans through attention mechanisms in BERTs and ViTs? In this paper, we investigate the attention hijacking pattern in Trojan AIs, \ie, the trigger token ``kidnaps'' the attention weights when a specific trigger is present. We observe the consistent attention hijacking pattern in Trojan Transformers from both Natural Language Processing (NLP) and Computer Vision (CV) domains. This intriguing property helps us to understand the Trojan mechanism in BERTs and ViTs. We also propose an Attention-Hijacking Trojan Detector (AHTD) to discriminate the Trojan AIs from the clean ones.

Viaarxiv icon

Expanding Language-Image Pretrained Models for General Video Recognition

Aug 04, 2022
Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling

Figure 1 for Expanding Language-Image Pretrained Models for General Video Recognition
Figure 2 for Expanding Language-Image Pretrained Models for General Video Recognition
Figure 3 for Expanding Language-Image Pretrained Models for General Video Recognition
Figure 4 for Expanding Language-Image Pretrained Models for General Video Recognition

Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data, demonstrating remarkable "zero-shot" generalization ability for various image tasks. However, how to effectively expand such new language-image pretraining methods to video domains is still an open problem. In this work, we present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly, instead of pretraining a new model from scratch. More concretely, to capture the long-range dependencies of frames along the temporal dimension, we propose a cross-frame attention mechanism that explicitly exchanges information across frames. Such module is lightweight and can be plugged into pretrained language-image models seamlessly. Moreover, we propose a video-specific prompting scheme, which leverages video content information for generating discriminative textual prompts. Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios. In particular, under fully-supervised settings, our approach achieves a top-1 accuracy of 87.1% on Kinectics-400, while using 12 times fewer FLOPs compared with Swin-L and ViViT-H. In zero-shot experiments, our approach surpasses the current state-of-the-art methods by +7.6% and +14.9% in terms of top-1 accuracy under two popular protocols. In few-shot scenarios, our approach outperforms previous best methods by +32.1% and +23.1% when the labeled data is extremely limited. Code and models are available at https://aka.ms/X-CLIP

* Accepted by ECCV2022, Oral 
Viaarxiv icon