Alert button
Picture for Yen-Cheng Liu

Yen-Cheng Liu

Alert button

Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Apr 21, 2023
Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira

Figure 1 for Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Figure 2 for Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Figure 3 for Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation
Figure 4 for Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fusion mechanism Linear Fusion, that performs better than the state-of-the-art multi-modal models even with limited supervision. Second, we propose M3L: Multi-modal Teacher for Masked Modality Learning, a semi-supervised framework that not only improves the multi-modal performance but also makes the model robust to the realistic missing modality scenario using unlabeled data. We create the first benchmark for semi-supervised multi-modal semantic segmentation and also report the robustness to missing modalities. Our proposal shows an absolute improvement of up to 10% on robust mIoU above the most competitive baselines. Our code is available at https://github.com/harshm121/M3L

Viaarxiv icon

Trainable Projected Gradient Method for Robust Fine-tuning

Mar 28, 2023
Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

Figure 1 for Trainable Projected Gradient Method for Robust Fine-tuning
Figure 2 for Trainable Projected Gradient Method for Robust Fine-tuning
Figure 3 for Trainable Projected Gradient Method for Robust Fine-tuning
Figure 4 for Trainable Projected Gradient Method for Robust Fine-tuning

Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation could explain the regularization capability of TPGM. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{https://github.com/PotatoTian/TPGM}.

* Conference on Computer Vision and Pattern Recognition 2023  
* Accepted to CVPR2023 
Viaarxiv icon

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

Oct 07, 2022
Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira

Figure 1 for Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Figure 2 for Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Figure 3 for Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Figure 4 for Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks while training only a few parameters. Despite their success, most existing methods are proposed in Natural Language Processing tasks with language Transformers, and adaptation to Computer Vision tasks with Vision Transformers remains under-explored, especially for dense vision tasks. Further, in multi-task settings, individually fine-tuning and storing separate models for different tasks is inefficient. In this work, we provide an extensive multi-task parameter-efficient benchmark and examine existing parameter-efficient fine-tuning NLP methods for vision tasks. Our results on four different dense vision tasks showed that existing methods cannot be efficiently integrated due to the hierarchical nature of the Hierarchical Vision Transformers. To overcome this issue, we propose Polyhistor and Polyhistor-Lite, consisting of Decomposed HyperNetworks and Layer-wise Scaling Kernels, to share information across different tasks with a few trainable parameters. This leads to favorable performance improvements against existing parameter-efficient methods while using fewer trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters. Furthermore, our methods show larger performance gains when large networks and more pretraining data are used.

* Accepted to NeurIPS 2022; Project Page is at https://ycliu93.github.io/projects/polyhistor.html 
Viaarxiv icon

Open-Set Semi-Supervised Object Detection

Aug 29, 2022
Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

Figure 1 for Open-Set Semi-Supervised Object Detection
Figure 2 for Open-Set Semi-Supervised Object Detection
Figure 3 for Open-Set Semi-Supervised Object Detection
Figure 4 for Open-Set Semi-Supervised Object Detection

Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open-Set Semi-Supervised Object Detection (OSSOD). We first find the existing SSOD method obtains a lower performance gain in open-set conditions, and this is caused by the semantic expansion, where the distracting OOD objects are mispredicted as in-distribution pseudo-labels for the semi-supervised training. To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods. With the extensive studies, we found that leveraging an offline OOD detector based on a self-supervised vision transformer performs favorably against online OOD detectors due to its robustness to the interference of pseudo-labeling. In the experiment, our proposed framework effectively addresses the semantic expansion issue and shows consistent improvements on many OSSOD benchmarks, including large-scale COCO-OpenImages. We also verify the effectiveness of our framework under different OSSOD conditions, including varying numbers of in-distribution classes, different degrees of supervision, and different combinations of unlabeled sets.

* Project Page is at https://ycliu93.github.io/projects/ossod.html 
Viaarxiv icon

Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

Jun 19, 2022
Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

Figure 1 for Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors
Figure 2 for Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors
Figure 3 for Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors
Figure 4 for Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

With the recent development of Semi-Supervised Object Detection (SS-OD) techniques, object detectors can be improved by using a limited amount of labeled data and abundant unlabeled data. However, there are still two challenges that are not addressed: (1) there is no prior SS-OD work on anchor-free detectors, and (2) prior works are ineffective when pseudo-labeling bounding box regression. In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss. Specifically, we first present a study examining the effectiveness of existing SS-OD methods on anchor-free detectors and find that they achieve much lower performance improvements under the semi-supervised setting. We also observe that box selection with centerness and the localization-based labeling used in anchor-free detectors cannot work well under the semi-supervised setting. On the other hand, our Listen2Student mechanism explicitly prevents misleading pseudo-labels in the training of bounding box regression; we specifically develop a novel pseudo-labeling selection mechanism based on the Teacher and Student's relative uncertainties. This idea contributes to favorable improvement in the regression branch in the semi-supervised setting. Our method, which works for both anchor-free and anchor-based methods, consistently performs favorably against the state-of-the-art methods in VOC, COCO-standard, and COCO-additional.

* Project Page is at http://ycliu93.github.io/projects/unbiasedteacher2.html 
Viaarxiv icon

Cross-Domain Object Detection via Adaptive Self-Training

Nov 25, 2021
Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vadja

Figure 1 for Cross-Domain Object Detection via Adaptive Self-Training
Figure 2 for Cross-Domain Object Detection via Adaptive Self-Training
Figure 3 for Cross-Domain Object Detection via Adaptive Self-Training
Figure 4 for Cross-Domain Object Detection via Adaptive Self-Training

We tackle the problem of domain adaptation in object detection, where there is a significant domain shift between a source (a domain with supervision) and a target domain (a domain of interest without supervision). As a widely adopted domain adaptation method, the self-training teacher-student framework (a student model learns from pseudo labels generated from a teacher model) has yielded remarkable accuracy gain on the target domain. However, it still suffers from the large amount of low-quality pseudo labels (e.g., false positives) generated from the teacher due to its bias toward the source domain. To address this issue, we propose a self-training framework called Adaptive Unbiased Teacher (AUT) leveraging adversarial learning and weak-strong data augmentation during mutual learning to address domain shift. Specifically, we employ feature-level adversarial training in the student model, ensuring features extracted from the source and target domains share similar statistics. This enables the student model to capture domain-invariant features. Furthermore, we apply weak-strong augmentation and mutual learning between the teacher model on the target domain and the student model on both domains. This enables the teacher model to gradually benefit from the student model without suffering domain shift. We show that AUT demonstrates superiority over all existing approaches and even Oracle (fully supervised) models by a large margin. For example, we achieve 50.9% (49.3%) mAP on Foggy Cityscape (Clipart1K), which is 9.2% (5.2%) and 8.2% (11.0%) higher than previous state-of-the-art and Oracle, respectively

* 15 pages. arXiv admin note: text overlap with arXiv:2003.00707, arXiv:1904.11245, arXiv:1910.11319, arXiv:2003.09152 by other authors 
Viaarxiv icon

Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking

Jul 01, 2021
Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, Zsolt Kira

Figure 1 for Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking
Figure 2 for Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking
Figure 3 for Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking
Figure 4 for Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking

In this paper, we address bandwidth-limited and obstruction-prone collaborative perception, specifically in the context of multi-agent semantic segmentation. This setting presents several key challenges, including processing and exchanging unregistered robotic swarm imagery. To be successful, solutions must effectively leverage multiple non-static and intermittently-overlapping RGB perspectives, while heeding bandwidth constraints and overcoming unwanted foreground obstructions. As such, we propose an end-to-end learn-able Multi-Agent Spatial Handshaking network (MASH) to process, compress, and propagate visual information across a robotic swarm. Our distributed communication module operates directly (and exclusively) on raw image data, without additional input requirements such as pose, depth, or warping data. We demonstrate superior performance of our model compared against several baselines in a photo-realistic multi-robot AirSim environment, especially in the presence of image occlusions. Our method achieves an absolute 11% IoU improvement over strong baselines.

* Accepted to IROS 2021 
Viaarxiv icon

Enhancing Multi-Robot Perception via Learned Data Association

Jul 01, 2021
Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, Zsolt Kira

Figure 1 for Enhancing Multi-Robot Perception via Learned Data Association
Figure 2 for Enhancing Multi-Robot Perception via Learned Data Association
Figure 3 for Enhancing Multi-Robot Perception via Learned Data Association

In this paper, we address the multi-robot collaborative perception problem, specifically in the context of multi-view infilling for distributed semantic segmentation. This setting entails several real-world challenges, especially those relating to unregistered multi-agent image data. Solutions must effectively leverage multiple, non-static, and intermittently-overlapping RGB perspectives. To this end, we propose the Multi-Agent Infilling Network: an extensible neural architecture that can be deployed (in a distributed manner) to each agent in a robotic swarm. Specifically, each robot is in charge of locally encoding and decoding visual information, and an extensible neural mechanism allows for an uncertainty-aware and context-based exchange of intermediate features. We demonstrate improved performance on a realistic multi-robot AirSim dataset.

* Accepted to ICRA 2020 Workshop on "Emerging Learning and Algorithmic Methods for Data Association in Robotics"; associated spotlight talk available at https://www.youtube.com/watch?v=-lEVvtsfz0I&t=16743s 
Viaarxiv icon

Unbiased Teacher for Semi-Supervised Object Detection

Feb 18, 2021
Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

Figure 1 for Unbiased Teacher for Semi-Supervised Object Detection
Figure 2 for Unbiased Teacher for Semi-Supervised Object Detection
Figure 3 for Unbiased Teacher for Semi-Supervised Object Detection
Figure 4 for Unbiased Teacher for Semi-Supervised Object Detection

Semi-supervised learning, i.e., training networks with both labeled and unlabeled data, has made significant progress recently. However, existing works have primarily focused on image classification tasks and neglected object detection which requires more annotation effort. In this work, we revisit the Semi-Supervised Object Detection (SS-OD) and identify the pseudo-labeling bias issue in SS-OD. To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner. Together with a class-balance loss to downweight overly confident pseudo-labels, Unbiased Teacher consistently improved state-of-the-art methods by significant margins on COCO-standard, COCO-additional, and VOC datasets. Specifically, Unbiased Teacher achieves 6.8 absolute mAP improvements against state-of-the-art method when using 1% of labeled data on MS-COCO, achieves around 10 mAP improvements against the supervised baseline when using only 0.5, 1, 2% of labeled data on MS-COCO.

* Accepted to ICLR 2021; Code is available at https://github.com/facebookresearch/unbiased-teacher 
Viaarxiv icon