Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fei Pan

Kuaishou Technology

MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Sep 21, 2023

Fei Pan, Xu Yin, Seokju Lee, Sungeui Yoon, In So Kweon

Figure 1 for MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Figure 2 for MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Figure 3 for MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Figure 4 for MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Abstract:Unsupervised domain adaptation (UDA) is an effective approach to handle the lack of annotations in the target domain for the semantic segmentation task. In this work, we consider a more practical UDA setting where the target domain contains sequential frames of the unlabeled videos which are easy to collect in practice. A recent study suggests self-supervised learning of the object motion from unlabeled videos with geometric constraints. We design a motion-guided domain adaptive semantic segmentation framework (MoDA), that utilizes self-supervised object motion to learn effective representations in the target domain. MoDA differs from previous methods that use temporal consistency regularization for the target domain frames. Instead, MoDA deals separately with the domain alignment on the foreground and background categories using different strategies. Specifically, MoDA contains foreground object discovery and foreground semantic mining to align the foreground domain gaps by taking the instance-level guidance from the object motion. Additionally, MoDA includes background adversarial training which contains a background category-specific discriminator to handle the background domain gaps. Experimental results on multiple benchmarks highlight the effectiveness of MoDA against existing approaches in the domain adaptive image segmentation and domain adaptive video segmentation. Moreover, MoDA is versatile and can be used in conjunction with existing state-of-the-art approaches to further improve performance.

* Under Review in IEEE Transactions on Image Processing

Via

Access Paper or Ask Questions

Dual-View Selective Instance Segmentation Network for Unstained Live Adherent Cells in Differential Interference Contrast Images

Jan 27, 2023

Fei Pan, Yutong Wu, Kangning Cui, Shuxun Chen, Yanfang Li, Yaofang Liu, Adnan Shakoor, Han Zhao, Beijia Lu, Shaohua Zhi(+2 more)

Abstract:Despite recent advances in data-independent and deep-learning algorithms, unstained live adherent cell instance segmentation remains a long-standing challenge in cell image processing. Adherent cells' inherent visual characteristics, such as low contrast structures, fading edges, and irregular morphology, have made it difficult to distinguish from one another, even by human experts, let alone computational methods. In this study, we developed a novel deep-learning algorithm called dual-view selective instance segmentation network (DVSISN) for segmenting unstained adherent cells in differential interference contrast (DIC) images. First, we used a dual-view segmentation (DVS) method with pairs of original and rotated images to predict the bounding box and its corresponding mask for each cell instance. Second, we used a mask selection (MS) method to filter the cell instances predicted by the DVS to keep masks closest to the ground truth only. The developed algorithm was trained and validated on our dataset containing 520 images and 12198 cells. Experimental results demonstrate that our algorithm achieves an AP_segm of 0.555, which remarkably overtakes a benchmark by a margin of 23.6%. This study's success opens up a new possibility of using rotated images as input for better prediction in cell images.

* 13 pages, 5 figures, 3 tables

Via

Access Paper or Ask Questions

ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Jul 19, 2022

Fei Pan, Sungsu Hur, Seokju Lee, Junsik Kim, In So Kweon

Figure 1 for ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Figure 2 for ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Figure 3 for ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Figure 4 for ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open Compound Domain Adaptation in Semantic Segmentation

Abstract:Open compound domain adaptation (OCDA) considers the target domain as the compound of multiple unknown homogeneous subdomains. The goal of OCDA is to minimize the domain gap between the labeled source domain and the unlabeled compound target domain, which benefits the model generalization to the unseen domains. Current OCDA for semantic segmentation methods adopt manual domain separation and employ a single model to simultaneously adapt to all the target subdomains. However, adapting to a target subdomain might hinder the model from adapting to other dissimilar target subdomains, which leads to limited performance. In this work, we introduce a multi-teacher framework with bidirectional photometric mixing to separately adapt to every target subdomain. First, we present an automatic domain separation to find the optimal number of subdomains. On this basis, we propose a multi-teacher framework in which each teacher model uses bidirectional photometric mixing to adapt to one target subdomain. Furthermore, we conduct an adaptive distillation to learn a student model and apply consistency regularization to improve the student generalization. Experimental results on benchmark datasets show the efficacy of the proposed approach for both the compound domain and the open domains against existing state-of-the-art approaches.

* Accepted to ECCV 2022

Via

Access Paper or Ask Questions

Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Jun 04, 2022

Fei Pan, Francois Rameau, Junsik Kim, In So Kweon

Figure 1 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 2 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 3 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 4 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Abstract:Training models dedicated to semantic segmentation requires a large amount of pixel-wise annotated data. Due to their costly nature, these annotations might not be available for the task at hand. To alleviate this problem, unsupervised domain adaptation approaches aim at aligning the feature distributions between the labeled source and the unlabeled target data. While these strategies lead to noticeable improvements, their effectiveness remains limited. To guide the domain adaptation task more efficiently, previous works attempted to include human interactions in this process under the form of sparse single-pixel annotations in the target data. In this work, we propose a new domain adaptation framework for semantic segmentation with annotated points via active selection. First, we conduct an unsupervised domain adaptation of the model; from this adaptation, we use an entropy-based uncertainty measurement for target points selection. Finally, to minimize the domain gap, we propose a domain adaptation framework utilizing these target points annotated by human annotators. Experimental results on benchmark datasets show the effectiveness of our methods against existing unsupervised domain adaptation approaches. The propose pipeline is generic and can be included as an extra module to existing domain adaptation strategies.

Via

Access Paper or Ask Questions

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Oct 13, 2021

Seokju Lee, Francois Rameau, Fei Pan, In So Kweon

Figure 1 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 2 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 3 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 4 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Abstract:Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task that often relies on the so-called scene rigidity assumption. When observing a dynamic environment, this assumption is violated which leads to an ambiguity between the ego-motion of the camera and the motion of the objects. To solve this problem, we present a self-supervised learning framework for 3D object motion field estimation from monocular videos. Our contributions are two-fold. First, we propose a two-stage projection pipeline to explicitly disentangle the camera ego-motion and the object motions with dynamics attention module, called DAM. Specifically, we design an integrated motion model that estimates the motion of the camera and object in the first and second warping stages, respectively, controlled by the attention module through a shared motion encoder. Second, we propose an object motion field estimation through contrastive sample consensus, called CSAC, taking advantage of weak semantic prior (bounding box from an object detector) and geometric constraints (each object respects the rigid body motion model). Experiments on KITTI, Cityscapes, and Waymo Open Dataset demonstrate the relevance of our approach and show that our method outperforms state-of-the-art algorithms for the tasks of self-supervised monocular depth estimation, object motion segmentation, monocular scene flow estimation, and visual odometry.

* ICCV 2021

Via

Access Paper or Ask Questions

Transductive Maximum Margin Classifier for Few-Shot Learning

Jul 26, 2021

Fei Pan, Chunlei Xu, Jie Guo, Yanwen Guo

Figure 1 for Transductive Maximum Margin Classifier for Few-Shot Learning

Figure 2 for Transductive Maximum Margin Classifier for Few-Shot Learning

Figure 3 for Transductive Maximum Margin Classifier for Few-Shot Learning

Figure 4 for Transductive Maximum Margin Classifier for Few-Shot Learning

Abstract:Few-shot learning aims to train a classifier that can generalize well when just a small number of labeled samples per class are given. We introduce Transductive Maximum Margin Classifier (TMMC) for few-shot learning. The basic idea of the classical maximum margin classifier is to solve an optimal prediction function that the corresponding separating hyperplane can correctly divide the training data and the resulting classifier has the largest geometric margin. In few-shot learning scenarios, the training samples are scarce, not enough to find a separating hyperplane with good generalization ability on unseen data. TMMC is constructed using a mixture of the labeled support set and the unlabeled query set in a given task. The unlabeled samples in the query set can adjust the separating hyperplane so that the prediction function is optimal on both the labeled and unlabeled samples. Furthermore, we leverage an efficient and effective quasi-Newton algorithm, the L-BFGS method to optimize TMMC. Experimental results on three standard few-shot learning benchmarks including miniImagenet, tieredImagenet and CUB suggest that our TMMC achieves state-of-the-art accuracies.

Via

Access Paper or Ask Questions

Temporal Alignment Prediction for Few-Shot Video Classification

Jul 26, 2021

Fei Pan, Chunlei Xu, Jie Guo, Yanwen Guo

Figure 1 for Temporal Alignment Prediction for Few-Shot Video Classification

Figure 2 for Temporal Alignment Prediction for Few-Shot Video Classification

Figure 3 for Temporal Alignment Prediction for Few-Shot Video Classification

Figure 4 for Temporal Alignment Prediction for Few-Shot Video Classification

Abstract:The goal of few-shot video classification is to learn a classification model with good generalization ability when trained with only a few labeled videos. However, it is difficult to learn discriminative feature representations for videos in such a setting. In this paper, we propose Temporal Alignment Prediction (TAP) based on sequence similarity learning for few-shot video classification. In order to obtain the similarity of a pair of videos, we predict the alignment scores between all pairs of temporal positions in the two videos with the temporal alignment prediction function. Besides, the inputs to this function are also equipped with the context information in the temporal domain. We evaluate TAP on two video classification benchmarks including Kinetics and Something-Something V2. The experimental results verify the effectiveness of TAP and show its superiority over state-of-the-art methods.

Via

Access Paper or Ask Questions

We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Jun 08, 2021

Liyi Guo, Junqi Jin, Haoqi Zhang, Zhenzhe Zheng, Zhiye Yang, Zhizhuang Xing, Fei Pan, Lvyin Niu, Fan Wu, Haiyang Xu(+3 more)

Figure 1 for We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Figure 2 for We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Figure 3 for We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Figure 4 for We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Abstract:Advertising expenditures have become the major source of revenue for e-commerce platforms. Providing good advertising experiences for advertisers through reducing their costs of trial and error for discovering the optimal advertising strategies is crucial for the long-term prosperity of online advertising. To achieve this goal, the advertising platform needs to identify the advertisers' marketing objectives, and then recommend the corresponding strategies to fulfill this objective. In this work, we first deploy a prototype of strategy recommender system on Taobao display advertising platform, recommending bid prices and targeted users to advertisers. We further augment this prototype system by directly revealing the advertising performance, and then infer the advertisers' marketing objectives through their adoptions of different recommending advertising performance. We use the techniques from context bandit to jointly learn the advertisers' marketing objectives and the recommending strategies. Online evaluations show that the designed advertising strategy recommender system can optimize the advertisers' advertising performance and increase the platform's revenue. Simulation experiments based on Taobao online bidding data show that the designed contextual bandit algorithm can effectively optimize the strategy adoption rate of advertisers.

* KDD 2021, Virtual Event, Singapore
* Accepted by KDD 2021

Via

Access Paper or Ask Questions

Learning Structures for Deep Neural Networks

May 27, 2021

Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu

Figure 1 for Learning Structures for Deep Neural Networks

Figure 2 for Learning Structures for Deep Neural Networks

Figure 3 for Learning Structures for Deep Neural Networks

Figure 4 for Learning Structures for Deep Neural Networks

Abstract:In this paper, we focus on the unsupervised setting for structure learning of deep neural networks and propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience, to guide the procedure of structure learning without label information. This principle suggests that a good network structure should maximize the mutual information between inputs and outputs, or equivalently maximize the entropy of outputs under mild assumptions. We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy. Then as an implementation of the principle, we show that sparse coding can effectively maximize the entropy of the output signals, and accordingly design an algorithm based on global group sparse coding to automatically learn the inter-layer connection and determine the depth of a neural network. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure (i.e., convolutional neural networks (CNN)). In addition, our proposed algorithm successfully discovers the local connectivity (corresponding to local receptive fields in CNN) and invariance structure (corresponding to pulling in CNN), as well as achieves a good tradeoff between marginal performance gain and network depth.

Via

Access Paper or Ask Questions

Two-phase Pseudo Label Densification for Self-training based Domain Adaptation

Dec 09, 2020

Inkyu Shin, Sanghyun Woo, Fei Pan, InSo Kweon

Figure 1 for Two-phase Pseudo Label Densification for Self-training based Domain Adaptation

Figure 2 for Two-phase Pseudo Label Densification for Self-training based Domain Adaptation

Figure 3 for Two-phase Pseudo Label Densification for Self-training based Domain Adaptation

Figure 4 for Two-phase Pseudo Label Densification for Self-training based Domain Adaptation

Abstract:Recently, deep self-training approaches emerged as a powerful solution to the unsupervised domain adaptation. The self-training scheme involves iterative processing of target data; it generates target pseudo labels and retrains the network. However, since only the confident predictions are taken as pseudo labels, existing self-training approaches inevitably produce sparse pseudo labels in practice. We see this is critical because the resulting insufficient training-signals lead to a suboptimal, error-prone model. In order to tackle this problem, we propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD. In the first phase, we use sliding window voting to propagate the confident predictions, utilizing intrinsic spatial-correlations in the images. In the second phase, we perform a confidence-based easy-hard classification. For the easy samples, we now employ their full pseudo labels. For the hard ones, we instead adopt adversarial learning to enforce hard-to-easy feature alignment. To ease the training process and avoid noisy predictions, we introduce the bootstrapping mechanism to the original self-training loss. We show the proposed TPLD can be easily integrated into existing self-training based approaches and improves the performance significantly. Combined with the recently proposed CRST self-training framework, we achieve new state-of-the-art results on two standard UDA benchmarks.

* Accepted to ECCV 2020

Via

Access Paper or Ask Questions