Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Zhou

Self-Guided Curriculum Learning for Neural Machine Translation

May 15, 2021

Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda

Figure 1 for Self-Guided Curriculum Learning for Neural Machine Translation

Figure 2 for Self-Guided Curriculum Learning for Neural Machine Translation

Figure 3 for Self-Guided Curriculum Learning for Neural Machine Translation

Figure 4 for Self-Guided Curriculum Learning for Neural Machine Translation

Abstract:In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English$\Rightarrow$German and WMT17 Chinese$\Rightarrow$English, demonstrate that our approach can consistently improve translation performance against strong baseline Transformer.

* Work in progress

Via

Access Paper or Ask Questions

Cascaded Feature Warping Network for Unsupervised Medical Image Registration

Mar 15, 2021

Liutong Zhang, Lei Zhou, Ruiyang Li, Xianyu Wang, Boxuan Han, Hongen Liao

Figure 1 for Cascaded Feature Warping Network for Unsupervised Medical Image Registration

Abstract:Deformable image registration is widely utilized in medical image analysis, but most proposed methods fail in the situation of complex deformations. In this paper, we pre-sent a cascaded feature warping network to perform the coarse-to-fine registration. To achieve this, a shared-weights encoder network is adopted to generate the feature pyramids for the unaligned images. The feature warping registration module is then used to estimate the deformation field at each level. The coarse-to-fine manner is implemented by cascading the module from the bottom level to the top level. Furthermore, the multi-scale loss is also introduced to boost the registration performance. We employ two public benchmark datasets and conduct various experiments to evaluate our method. The results show that our method outperforms the state-of-the-art methods, which also demonstrates that the cascaded feature warping network can perform the coarse-to-fine registration effectively and efficiently.

* Accepted by ISBI2021

Via

Access Paper or Ask Questions

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Mar 09, 2021

Xuyang Bai, Zixin Luo, Lei Zhou, Hongkai Chen, Lei Li, Zeyu Hu, Hongbo Fu, Chiew-Lan Tai

Figure 1 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 2 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 3 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 4 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Abstract:Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.

* Accepted to CVPR 2021, supplementary materials included

Via

Access Paper or Ask Questions

Goal-Oriented Gaze Estimation for Zero-Shot Learning

Mar 05, 2021

Yang Liu, Lei Zhou, Xiao Bai, Yifei Huang, Lin Gu, Jun Zhou, Tatsuya Harada

Figure 1 for Goal-Oriented Gaze Estimation for Zero-Shot Learning

Figure 2 for Goal-Oriented Gaze Estimation for Zero-Shot Learning

Figure 3 for Goal-Oriented Gaze Estimation for Zero-Shot Learning

Figure 4 for Goal-Oriented Gaze Estimation for Zero-Shot Learning

Abstract:Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Since semantic knowledge is built on attributes shared between different classes, which are highly local, strong prior for localization of object attribute is beneficial for visual-semantic embedding. Interestingly, when recognizing unseen images, human would also automatically gaze at regions with certain semantic clue. Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL. We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description. Specifically, the task-dependent attention is learned with the goal-oriented GEM, and the global image features are simultaneously optimized with the regression of local attribute features. Experiments on three ZSL benchmarks, i.e., CUB, SUN and AWA2, show the superiority or competitiveness of our proposed method against the state-of-the-art ZSL methods. The ablation analysis on real gaze data CUB-VWSW also validates the benefits and accuracy of our gaze estimation module. This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks. The code is available at https://github.com/osierboy/GEM-ZSL.

* Accepted by CVPR2021

Via

Access Paper or Ask Questions

Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

Oct 10, 2020

Lei Zhou, Liang Ding, Koichi Takeda

Figure 1 for Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

Figure 2 for Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

Figure 3 for Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

Figure 4 for Zero-Shot Translation Quality Estimation with Explicit Cross-Lingual Patterns

Abstract:This paper describes our submission of the WMT 2020 Shared Task on Sentence Level Direct Assessment, Quality Estimation (QE). In this study, we empirically reveal the \textit{mismatching issue} when directly adopting BERTScore to QE. Specifically, there exist lots of mismatching errors between the source sentence and translated candidate sentence with token pairwise similarity. In response to this issue, we propose to expose explicit cross-lingual patterns, \textit{e.g.} word alignments and generation score, to our proposed zero-shot models. Experiments show that our proposed QE model with explicit cross-lingual patterns could alleviate the mismatching issue, thereby improving the performance. Encouragingly, our zero-shot QE method could achieve comparable performance with supervised QE method, and even outperforms the supervised counterpart on 2 out of 6 directions. We expect our work could shed light on the zero-shot QE model improvement.

* To appear in WMT2020

Via

Access Paper or Ask Questions

Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Sep 16, 2020

Yang Liu, Lei Zhou, Xiao Bai, Lin Gu, Tatsuya Harada, Jun Zhou

Figure 1 for Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Figure 2 for Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Figure 3 for Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Figure 4 for Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

Abstract:Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the noises and improve model abstraction capability. We evaluate the learned latent features on four benchmark datasets. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.

* 10 pages, 7 figures, 4 tables

Via

Access Paper or Ask Questions

Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Aug 04, 2020

Mingmin Zhen, Shiwei Li, Lei Zhou, Jiaxiang Shang, Haoan Feng, Tian Fang, Long Quan

Figure 1 for Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Figure 2 for Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Figure 3 for Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Figure 4 for Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation

Abstract:In this paper, we introduce a novel network, called discriminative feature network (DFNet), to address the unsupervised video object segmentation task. To capture the inherent correlation among video frames, we learn discriminative features (D-features) from the input images that reveal feature distribution from a global perspective. The D-features are then used to establish correspondence with all features of test image under conditional random field (CRF) formulation, which is leveraged to enforce consistency between pixels. The experiments verify that DFNet outperforms state-of-the-art methods by a large margin with a mean IoU score of 83.4% and ranks first on the DAVIS-2016 leaderboard while using much fewer parameters and achieving much more efficient performance in the inference phase. We further evaluate DFNet on the FBMS dataset and the video saliency dataset ViSal, reaching a new state-of-the-art. To further demonstrate the generalizability of our framework, DFNet is also applied to the image object co-segmentation task. We perform experiments on a challenging dataset PASCAL-VOC and observe the superiority of DFNet. The thorough experiments verify that DFNet is able to capture and mine the underlying relations of images and discover the common foreground objects.

* European Conference on Computer Vision 2020

Via

Access Paper or Ask Questions

Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Aug 02, 2020

Lei Zhou, Zixin Luo, Mingmin Zhen, Tianwei Shen, Shiwei Li, Zhuofei Huang, Tian Fang, Long Quan

Figure 1 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 2 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 3 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Figure 4 for Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction

Abstract:Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number. When the problem is scaled up, this step is neither efficient in computation nor manageable for a single compute node. In this work, we propose a stochastic bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM iterations to improve the efficiency and scalability. It first reformulates the quadratic programming problem of an LM iteration based on the clustering of the visibility graph by introducing the equality constraints across clusters. Then, we propose to relax it into a chance constrained problem and solve it through sampled convex program. The relaxation is intended to eliminate the interdependence between clusters embodied by the constraints, so that a large RCS can be decomposed into independent linear sub-problems. Numerical experiments on unordered Internet image sets and sequential SLAM image sets, as well as distributed experiments on large-scale datasets, have demonstrated the high efficiency and scalability of the proposed approach. Codes are released at https://github.com/zlthinker/STBA.

* Accepted by ECCV 2020

Via

Access Paper or Ask Questions

Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Jul 24, 2020

Jiaxiang Shang, Tianwei Shen, Shiwei Li, Lei Zhou, Mingmin Zhen, Tian Fang, Long Quan

Figure 1 for Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Figure 2 for Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Figure 3 for Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Figure 4 for Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency

Abstract:Recent learning-based approaches, in which models are trained by single-view images have shown promising results for monocular 3D face reconstruction, but they suffer from the ill-posed face pose and depth ambiguity issue. In contrast to previous works that only enforce 2D feature constraints, we propose a self-supervised training architecture by leveraging the multi-view geometry consistency, which provides reliable constraints on face pose and depth estimation. We first propose an occlusion-aware view synthesis method to apply multi-view geometry consistency to self-supervised learning. Then we design three novel loss functions for multi-view consistency, including the pixel consistency loss, the depth consistency loss, and the facial landmark-based epipolar loss. Our method is accurate and robust, especially under large variations of expressions, poses, and illumination conditions. Comprehensive experiments on the face alignment and 3D face reconstruction benchmarks have demonstrated superiority over state-of-the-art methods. Our code and model are released in https://github.com/jiaxiangshang/MGCNet.

* Accepted to ECCV 2020, supplementary materials included

Via

Access Paper or Ask Questions

End-to-end Optimized Video Compression with MV-Residual Prediction

May 26, 2020

XiangJi Wu, Ziwen Zhang, Jie Feng, Lei Zhou, Junmin Wu

Figure 1 for End-to-end Optimized Video Compression with MV-Residual Prediction

Figure 2 for End-to-end Optimized Video Compression with MV-Residual Prediction

Figure 3 for End-to-end Optimized Video Compression with MV-Residual Prediction

Figure 4 for End-to-end Optimized Video Compression with MV-Residual Prediction

Abstract:We present an end-to-end trainable framework for P-frame compression in this paper. A joint motion vector (MV) and residual prediction network MV-Residual is designed to extract the ensembled features of motion representations and residual information by treating the two successive frames as inputs. The prior probability of the latent representations is modeled by a hyperprior autoencoder and trained jointly with the MV-Residual network. Specially, the spatially-displaced convolution is applied for video frame prediction, in which a motion kernel for each pixel is learned to generate predicted pixel by applying the kernel at a displaced location in the source image. Finally, novel rate allocation and post-processing strategies are used to produce the final compressed bits, considering the bits constraint of the challenge. The experimental results on validation set show that the proposed optimized framework can generate the highest MS-SSIM for P-frame compression competition.

Via

Access Paper or Ask Questions