Alert button
Picture for Yinjie Lei

Yinjie Lei

Alert button

A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing

Jul 28, 2021
Wei Liu, Pingping Zhang, Yinjie Lei, Xiaolin Huang, Jie Yang, Michael Ng

Figure 1 for A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing
Figure 2 for A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing
Figure 3 for A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing
Figure 4 for A Generalized Framework for Edge-preserving and Structure-preserving Image Smoothing

Image smoothing is a fundamental procedure in applications of both computer vision and graphics. The required smoothing properties can be different or even contradictive among different tasks. Nevertheless, the inherent smoothing nature of one smoothing operator is usually fixed and thus cannot meet the various requirements of different applications. In this paper, we first introduce the truncated Huber penalty function which shows strong flexibility under different parameter settings. A generalized framework is then proposed with the introduced truncated Huber penalty function. When combined with its strong flexibility, our framework is able to achieve diverse smoothing natures where contradictive smoothing behaviors can even be achieved. It can also yield the smoothing behavior that can seldom be achieved by previous methods, and superior performance is thus achieved in challenging cases. These together enable our framework capable of a range of applications and able to outperform the state-of-the-art approaches in several tasks, such as image detail enhancement, clip-art compression artifacts removal, guided depth map restoration, image texture removal, etc. In addition, an efficient numerical solution is provided and its convergence is theoretically guaranteed even the optimization framework is non-convex and non-smooth. A simple yet effective approach is further proposed to reduce the computational cost of our method while maintaining its performance. The effectiveness and superior performance of our approach are validated through comprehensive experiments in a range of applications. Our code is available at https://github.com/wliusjtu/Generalized-Smoothing-Framework.

* This work is accepted by TPAMI. The code is available at https://github.com/wliusjtu/Generalized-Smoothing-Framework. arXiv admin note: substantial text overlap with arXiv:1907.09642 
Viaarxiv icon

Contextualize Knowledge Bases with Transformer for End-to-end Task-Oriented Dialogue Systems

Oct 22, 2020
Yanjie Gou, Yinjie Lei, Lingqiao Liu

Figure 1 for Contextualize Knowledge Bases with Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 2 for Contextualize Knowledge Bases with Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 3 for Contextualize Knowledge Bases with Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 4 for Contextualize Knowledge Bases with Transformer for End-to-end Task-Oriented Dialogue Systems

Recent studies try to build task-oriented dialogue systems in an end-to-end manner and the existing works make great progress on this task. However, there is still an issue need to be further considered, i.e., how to effectively represent the knowledge bases and incorporate that into dialogue systems. To solve this issue, we design a novel Transformer-based Context-aware Memory Generator to model the entities in knowledge bases, which can produce entity representations with perceiving all the relevant entities and dialogue history. Furthermore, we propose Context-aware Memory Enhanced Transformer (CMET), which can effectively aggregate information from the dialogue history and knowledge bases to generate more accurate responses. Through extensive experiments, our method can achieve superior performance over the state-of-the-art methods.

* Third version of this work; Correct some typos 
Viaarxiv icon

Context-aware Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue Systems

Oct 20, 2020
Yanjie Gou, Yinjie Lei, Lingqiao Liu

Figure 1 for Context-aware Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 2 for Context-aware Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 3 for Context-aware Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue Systems
Figure 4 for Context-aware Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue Systems

Recent studies try to build task-oriented dialogue systems in an end-to-end manner and the existing works make great progress on this task. However, there is still an issue need to be further considered, i.e., how to effectively represent the knowledge bases and incorporate that into dialogue systems. To solve this issue, we design a novel Context-aware Memory Generation module to model the knowledge bases, which can generate context-aware entity representations with perceiving relevant entities. Furthermore, we incorporate this module into Transformer and propose Context-aware Memory Enhanced Transformer (CMET), which can aggregate information from the dialogue history and knowledge bases to generate better responses. Through extensive experiments, our method can achieve superior performance over the state-of-the-art methods.

* Second version of this work; Slightly modify our method; Add the ablation study and error analysis; Correct some typos 
Viaarxiv icon

Hierarchical Paired Channel Fusion Network for Street Scene Change Detection

Oct 19, 2020
Yinjie Lei, Duo Peng, Pingping Zhang, Qiuhong Ke, Haifeng Li

Figure 1 for Hierarchical Paired Channel Fusion Network for Street Scene Change Detection
Figure 2 for Hierarchical Paired Channel Fusion Network for Street Scene Change Detection
Figure 3 for Hierarchical Paired Channel Fusion Network for Street Scene Change Detection
Figure 4 for Hierarchical Paired Channel Fusion Network for Street Scene Change Detection

Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key for the SSCD task is to design an effective feature fusion method that can improve the accuracy of the corresponding change maps. To this end, we present a novel Hierarchical Paired Channel Fusion Network (HPCFNet), which utilizes the adaptive fusion of paired feature channels. Specifically, the features of a given image pair are jointly extracted by a Siamese Convolutional Neural Network (SCNN) and hierarchically combined by exploring the fusion of channel pairs at multiple feature levels. In addition, based on the observation that the distribution of scene changes is diverse, we further propose a Multi-Part Feature Learning (MPFL) strategy to detect diverse changes. Based on the MPFL strategy, our framework achieves a novel approach to adapt to the scale and location diversities of the scene change regions. Extensive experiments on three public datasets (i.e., PCD, VL-CMU-CD and CDnet2014) demonstrate that the proposed framework achieves superior performance which outperforms other state-of-the-art methods with a considerable margin.

* To appear in Transactions on Image Processing, including 13 pages, 13 figures, 9 tables 
Viaarxiv icon

Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System

Oct 12, 2020
Yanjie Gou, Yinjie Lei, Lingqiao Liu

Figure 1 for Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System
Figure 2 for Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System
Figure 3 for Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System
Figure 4 for Dynamic Memory Enhanced Transformer for End-to-end Task-Oriented Dialogue System

Recent studies try to build task-oriented dialogue system in an end-to-end manner and the existing works make great progress on this task. However, there are still two issues need to consider: (1) How to effectively represent the knowledge bases and incorporate it into dialogue system. (2) How to efficiently reason the knowledge bases given queries. To solve these issues, we design a novel Transformer-based Dynamic Memory Network (DMN) with a novel Memory Mask scheme, which can dynamically generate the context-aware knowledge base representations, and reason the knowledge bases simultaneously. Furthermore, we incorporate the dynamic memory network into Transformer and propose Dynamic Memory Enhanced Transformer (DMET), which can aggregate information from dialogue history and knowledge bases to generate better responses. Through extensive experiments, our method can achieve superior performance over the state-of-the-art methods.

* First version of this work 
Viaarxiv icon

Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks

Jul 07, 2020
Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, Yinjie Lei

Figure 1 for Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks
Figure 2 for Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks
Figure 3 for Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks
Figure 4 for Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks

Most existing crowd counting systems rely on the availability of the object location annotation which can be expensive to obtain. To reduce the annotation cost, one attractive solution is to leverage a large number of unlabeled images to build a crowd counting model in semi-supervised fashion. This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. Our key idea is to leverage the unlabeled images to train a generic feature extractor rather than the entire network of a crowd counter. The rationale of this design is that learning the feature extractor can be more reliable and robust towards the inevitable noisy supervision generated from the unlabeled data. Also, on top of a good feature extractor, it is possible to build a density map regressor with much fewer density map annotations. Specifically, we proposed a novel semi-supervised crowd counting method which is built upon two innovative components: (1) a set of inter-related binary segmentation tasks are derived from the original density map regression task as the surrogate prediction target; (2) the surrogate target predictors are learned from both labeled and unlabeled data by utilizing a proposed self-training scheme which fully exploits the underlying constraints of these binary segmentation tasks. Through experiments, we show that the proposed method is superior over the existing semisupervised crowd counting method and other representative baselines.

* To be appeared at ECCV 2020 
Viaarxiv icon

Towards Using Count-level Weak Supervision for Crowd Counting

Feb 29, 2020
Yinjie Lei, Yan Liu, Pingping Zhang, Lingqiao Liu

Figure 1 for Towards Using Count-level Weak Supervision for Crowd Counting
Figure 2 for Towards Using Count-level Weak Supervision for Crowd Counting
Figure 3 for Towards Using Count-level Weak Supervision for Crowd Counting
Figure 4 for Towards Using Count-level Weak Supervision for Crowd Counting

Most existing crowd counting methods require object location-level annotation, i.e., placing a dot at the center of an object. While being simpler than the bounding-box or pixel-level annotation, obtaining this annotation is still labor-intensive and time-consuming especially for images with highly crowded scenes. On the other hand, weaker annotations that only know the total count of objects can be almost effortless in many practical scenarios. Thus, it is desirable to develop a learning method that can effectively train models from count-level annotations. To this end, this paper studies the problem of weakly-supervised crowd counting which learns a model from only a small amount of location-level annotations (fully-supervised) but a large amount of count-level annotations (weakly-supervised). To perform effective training in this scenario, we observe that the direct solution of regressing the integral of density map to the object count is not sufficient and it is beneficial to introduce stronger regularizations on the predicted density map of weakly-annotated images. We devise a simple-yet-effective training strategy, namely Multiple Auxiliary Tasks Training (MATT), to construct regularizes for restricting the freedom of the generated density maps. Through extensive experiments on existing datasets and a newly proposed dataset, we validate the effectiveness of the proposed weakly-supervised method and demonstrate its superior performance over existing solutions.

Viaarxiv icon

Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation

Dec 19, 2019
Jian Peng Bo Tang, Hao Jiang, Zhuo Li, Yinjie Lei, Tao Lin, Haifeng Li

Figure 1 for Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation
Figure 2 for Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation
Figure 3 for Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation
Figure 4 for Overcoming Long-term Catastrophic Forgetting through Adversarial Neural Pruning and Synaptic Consolidation

Enabling a neural network to sequentially learn multiple tasks is of great significance for expanding the applicability of neural networks in realistic human application scenarios. However, as the task sequence increases, the model quickly forgets previously learned skills; we refer to this loss of memory of long sequences as long-term catastrophic forgetting. There are two main reasons for the long-term forgetting: first, as the tasks increase, the intersection of the low-error parameter subspace satisfying these tasks will become smaller and smaller or even non-existent; The second is the cumulative error in the process of protecting the knowledge of previous tasks. This paper, we propose a confrontation mechanism in which neural pruning and synaptic consolidation are used to overcome long-term catastrophic forgetting. This mechanism distills task-related knowledge into a small number of parameters, and retains the old knowledge by consolidating a small number of parameters, while sparing most parameters to learn the follow-up tasks, which not only avoids forgetting but also can learn a large number of tasks. Specifically, the neural pruning iteratively relaxes the parameter conditions of the current task to expand the common parameter subspace of tasks; The modified synaptic consolidation strategy is comprised of two components, a novel network structure information considered measurement is proposed to calculate the parameter importance, and a element-wise parameter updating strategy that is designed to prevent significant parameters being overridden in subsequent learning. We verified the method on image classification, and the results showed that our proposed ANPSC approach outperforms the state-of-the-art methods. The hyperparametric sensitivity test further demonstrates the robustness of our proposed approach.

* 12 pages, 12 figures 
Viaarxiv icon

Improving Distant Supervised Relation Extraction by Dynamic Neural Network

Dec 13, 2019
Yanjie Gou, Yinjie Lei, Lingqiao Liu, Pingping Zhang, Xi Peng

Figure 1 for Improving Distant Supervised Relation Extraction by Dynamic Neural Network
Figure 2 for Improving Distant Supervised Relation Extraction by Dynamic Neural Network
Figure 3 for Improving Distant Supervised Relation Extraction by Dynamic Neural Network
Figure 4 for Improving Distant Supervised Relation Extraction by Dynamic Neural Network

Distant Supervised Relation Extraction (DSRE) is usually formulated as a problem of classifying a bag of sentences that contain two query entities, into the predefined relation classes. Most existing methods consider those relation classes as distinct semantic categories while ignoring their potential connection to query entities. In this paper, we propose to leverage this connection to improve the relation extraction accuracy. Our key ideas are twofold: (1) For sentences belonging to the same relation class, the expression style, i.e. words choice, can vary according to the query entities. To account for this style shift, the model should adjust its parameters in accordance with entity types. (2) Some relation classes are semantically similar, and the entity types appear in one relation may also appear in others. Therefore, it can be trained cross different relation classes and further enhance those classes with few samples, i.e., long-tail classes. To unify these two arguments, we developed a novel Dynamic Neural Network for Relation Extraction (DNNRE). The network adopts a novel dynamic parameter generator that dynamically generates the network parameters according to the query entity types and relation classes. By using this mechanism, the network can simultaneously handle the style shift problem and enhance the prediction accuracy for long-tail classes. Through our experimental study, we demonstrate the effectiveness of the proposed method and show that it can achieve superior performance over the state-of-the-art methods.

* 29 pages, 8 figures 
Viaarxiv icon