Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tieniu Tan

Model-free Test Time Adaptation for Out-Of-Distribution Detection

Nov 28, 2023

YiFan Zhang, Xue Wang, Tian Zhou, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Figure 1 for Model-free Test Time Adaptation for Out-Of-Distribution Detection

Figure 2 for Model-free Test Time Adaptation for Out-Of-Distribution Detection

Figure 3 for Model-free Test Time Adaptation for Out-Of-Distribution Detection

Figure 4 for Model-free Test Time Adaptation for Out-Of-Distribution Detection

Abstract:Out-of-distribution (OOD) detection is essential for the reliability of ML models. Most existing methods for OOD detection learn a fixed decision criterion from a given in-distribution dataset and apply it universally to decide if a data point is OOD. Recent work~\cite{fang2022is} shows that given only in-distribution data, it is impossible to reliably detect OOD data without extra assumptions. Motivated by the theoretical result and recent exploration of test-time adaptation methods, we propose a Non-Parametric Test Time \textbf{Ada}ptation framework for \textbf{O}ut-Of-\textbf{D}istribution \textbf{D}etection (\abbr). Unlike conventional methods, \abbr utilizes online test samples for model adaptation during testing, enhancing adaptability to changing data distributions. The framework incorporates detected OOD instances into decision-making, reducing false positive rates, particularly when ID and OOD distributions overlap significantly. We demonstrate the effectiveness of \abbr through comprehensive experiments on multiple OOD detection benchmarks, extensive empirical studies show that \abbr significantly improves the performance of OOD detection over state-of-the-art methods. Specifically, \abbr reduces the false positive rate (FPR95) by $23.23\%$ on the CIFAR-10 benchmarks and $38\%$ on the ImageNet-1k benchmarks compared to the advanced methods. Lastly, we theoretically verify the effectiveness of \abbr.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

Sep 22, 2023

Yi-Fan Zhang, Qingsong Wen, Xue Wang, Weiqi Chen, Liang Sun, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Figure 1 for OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

Figure 2 for OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

Figure 3 for OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

Figure 4 for OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling

Abstract:Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series modeling, we propose \textbf{On}line \textbf{e}nsembling \textbf{Net}work (OneNet). It dynamically updates and combines two models, with one focusing on modeling the dependency across the time dimension and the other on cross-variate dependency. Our method incorporates a reinforcement learning-based approach into the traditional online convex programming framework, allowing for the linear combination of the two models with dynamically adjusted weights. OneNet addresses the main shortcoming of classical online learning methods that tend to be slow in adapting to the concept drift. Empirical results show that OneNet reduces online forecasting error by more than $\mathbf{50\%}$ compared to the State-Of-The-Art (SOTA) method. The code is available at \url{https://github.com/yfzhang114/OneNet}.

* 32 pages, 11 figures, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Via

Access Paper or Ask Questions

Towards Realistic Unsupervised Fine-tuning with CLIP

Aug 24, 2023

Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan

Figure 1 for Towards Realistic Unsupervised Fine-tuning with CLIP

Figure 2 for Towards Realistic Unsupervised Fine-tuning with CLIP

Figure 3 for Towards Realistic Unsupervised Fine-tuning with CLIP

Figure 4 for Towards Realistic Unsupervised Fine-tuning with CLIP

Abstract:The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data might contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels. To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.

Via

Access Paper or Ask Questions

End-to-end Alternating Optimization for Real-World Blind Super Resolution

Aug 17, 2023

Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, Tieniu Tan

Abstract:Blind Super-Resolution (SR) usually involves two sub-problems: 1) estimating the degradation of the given low-resolution (LR) image; 2) super-resolving the LR image to its high-resolution (HR) counterpart. Both problems are ill-posed due to the information loss in the degrading process. Most previous methods try to solve the two problems independently, but often fall into a dilemma: a good super-resolved HR result requires an accurate degradation estimation, which however, is difficult to be obtained without the help of original HR information. To address this issue, instead of considering these two problems independently, we adopt an alternating optimization algorithm, which can estimate the degradation and restore the SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores the SR image based on the estimated degradation, and \textit{Estimator} estimates the degradation with the help of the restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, both \textit{Restorer} and \textit{Estimator} could get benefited from the intermediate results of each other, and make each sub-problem easier. Moreover, \textit{Restorer} and \textit{Estimator} are optimized in an end-to-end manner, thus they could get more tolerant of the estimation deviations of each other and cooperate better to achieve more robust and accurate final results. Extensive experiments on both synthetic datasets and real-world images show that the proposed method can largely outperform state-of-the-art methods and produce more visually favorable results. The codes are rleased at \url{https://github.com/greatlog/RealDAN.git}.

* International Journal of Computer Vision (IJCV) 2023
* Extension of our previous NeurIPS paper. Accepted to IJCV

Via

Access Paper or Ask Questions

GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images

Aug 07, 2023

Tianxiang Ma, Bingchuan Li, Qian He, Jing Dong, Tieniu Tan

Abstract:While current face animation methods can manipulate expressions individually, they suffer from several limitations. The expressions manipulated by some motion-based facial reenactment models are crude. Other ideas modeled with facial action units cannot generalize to arbitrary expressions not covered by annotations. In this paper, we introduce a novel Geometry-aware Facial Expression Translation (GaFET) framework, which is based on parametric 3D facial representations and can stably decoupled expression. Among them, a Multi-level Feature Aligned Transformer is proposed to complement non-geometric facial detail features while addressing the alignment challenge of spatial features. Further, we design a De-expression model based on StyleGAN, in order to reduce the learning difficulty of GaFET in unpaired "in-the-wild" images. Extensive qualitative and quantitative experiments demonstrate that we achieve higher-quality and more accurate facial expression transfer results compared to state-of-the-art methods, and demonstrate applicability of various poses and complex textures. Besides, videos or annotated training data are omitted, making our method easier to use and generalize.

* Accepted by ICCV2023

Via

Access Paper or Ask Questions

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

Jul 14, 2023

Zhengbo Wang, Jian Liang, Ran He, Nan Xu, Zilei Wang, Tieniu Tan

Abstract:With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf's law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

Free-style and Fast 3D Portrait Synthesis

Jun 28, 2023

Tianxiang Ma, Kang Zhao, Jianxin Sun, Jing Dong, Tieniu Tan

Abstract:Efficiently generating a free-style 3D portrait with high quality and consistency is a promising yet challenging task. The portrait styles generated by most existing methods are usually restricted by their 3D generators, which are learned in specific facial datasets, such as FFHQ. To get a free-style 3D portrait, one can build a large-scale multi-style database to retrain the 3D generator, or use a off-the-shelf tool to do the style translation. However, the former is time-consuming due to data collection and training process, the latter may destroy the multi-view consistency. To tackle this problem, we propose a fast 3D portrait synthesis framework in this paper, which enable one to use text prompts to specify styles. Specifically, for a given portrait style, we first leverage two generative priors, a 3D-aware GAN generator (EG3D) and a text-guided image editor (Ip2p), to quickly construct a few-shot training set, where the inference process of Ip2p is optimized to make editing more stable. Then we replace original triplane generator of EG3D with a Image-to-Triplane (I2T) module for two purposes: 1) getting rid of the style constraints of pre-trained EG3D by fine-tuning I2T on the few-shot dataset; 2) improving training efficiency by fixing all parts of EG3D except I2T. Furthermore, we construct a multi-style and multi-identity 3D portrait database to demonstrate the scalability and generalization of our method. Experimental results show that our method is capable of synthesizing high-quality 3D portraits with specified styles in a few minutes, outperforming the state-of-the-art.

* project website: https://tianxiangma.github.io/FF3D

Via

Access Paper or Ask Questions

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Apr 25, 2023

Yi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Figure 1 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 2 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 3 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 4 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Abstract:Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC.

* The Fortieth International Conference on Machine Learning, ICML, 2023
* 30 pages, 12 figures

Via

Access Paper or Ask Questions

Fully Sparse Fusion for 3D Object Detection

Apr 25, 2023

Yingyan Li, Lue Fan, Yang Liu, Zehao Huang, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, Tieniu Tan

Figure 1 for Fully Sparse Fusion for 3D Object Detection

Figure 2 for Fully Sparse Fusion for 3D Object Detection

Figure 3 for Fully Sparse Fusion for 3D Object Detection

Figure 4 for Fully Sparse Fusion for 3D Object Detection

Abstract:Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not suitable for long-range detection. Fully sparse architecture is gaining attention as they are highly efficient in long-range perception. In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture. Particularly, utilizing instance queries, our framework integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the fully sparse detector. This design achieves a uniform query-based fusion framework in both the 2D and 3D sides while maintaining the fully sparse characteristic. Extensive experiments showcase state-of-the-art results on the widely used nuScenes dataset and the long-range Argoverse 2 dataset. Notably, the inference speed of the proposed method under the long-range LiDAR perception setting is 2.7 $\times$ faster than that of other state-of-the-art multimodal 3D detection methods. Code will be released at \url{https://github.com/BraveGroup/FullySparseFusion}.

Via

Access Paper or Ask Questions

Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

Apr 17, 2023

Weinan Guan, Wei Wang, Jing Dong, Bo Peng, Tieniu Tan

Figure 1 for Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

Figure 2 for Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

Figure 3 for Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

Figure 4 for Collaborative Feature Learning for Fine-grained Facial Forgery Detection and Segmentation

Abstract:Detecting maliciously falsified facial images and videos has attracted extensive attention from digital-forensics and computer-vision communities. An important topic in manipulation detection is the localization of the fake regions. Previous work related to forgery detection mostly focuses on the entire faces. However, recent forgery methods have developed to edit important facial components while maintaining others unchanged. This drives us to not only focus on the forgery detection but also fine-grained falsified region segmentation. In this paper, we propose a collaborative feature learning approach to simultaneously detect manipulation and segment the falsified components. With the collaborative manner, detection and segmentation can boost each other efficiently. To enable our study of forgery detection and segmentation, we build a facial forgery dataset consisting of both entire and partial face forgeries with their pixel-level manipulation ground-truth. Experiment results have justified the mutual promotion between forgery detection and manipulated region segmentation. The overall performance of the proposed approach is better than the state-of-the-art detection or segmentation approaches. The visualization results have shown that our proposed model always captures the artifacts on facial regions, which is more reasonable.

Via

Access Paper or Ask Questions