Alert button
Picture for Jianfeng Li

Jianfeng Li

Alert button

An End-to-End Network for Upright Adjustment of Panoramic Images

Apr 12, 2023
Heyu Chen, Jianfeng Li, Shigang Li

Figure 1 for An End-to-End Network for Upright Adjustment of Panoramic Images
Figure 2 for An End-to-End Network for Upright Adjustment of Panoramic Images
Figure 3 for An End-to-End Network for Upright Adjustment of Panoramic Images
Figure 4 for An End-to-End Network for Upright Adjustment of Panoramic Images

Nowadays, panoramic images can be easily obtained by panoramic cameras. However, when the panoramic camera orientation is tilted, a non-upright panoramic image will be captured. Existing upright adjustment models focus on how to estimate more accurate camera orientation, and attribute image reconstruction to offline or post-processing tasks. To this end, we propose an online end-to-end network for upright adjustment. Our network is designed to reconstruct the image while finding the angle. Our network consists of three modules: orientation estimation, LUT online generation, and upright reconstruction. Direction estimation estimates the tilt angle of the panoramic image. Then, a converter block with upsampling function is designed to generate angle to LUT. This module can output corresponding online LUT for different input angles. Finally, a lightweight generative adversarial network (GAN) aims to generate upright images from shallow features. The experimental results show that in terms of angles, we have improved the accuracy of small angle errors. In terms of image reconstruction, In image reconstruction, we have achieved the first real-time online upright reconstruction of panoramic images using deep learning networks.

Viaarxiv icon

Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

May 30, 2021
Jianfeng Li, Junqiao Zhao, Shuangfu Song, Tiantian Feng

Figure 1 for Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video
Figure 2 for Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video
Figure 3 for Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video
Figure 4 for Unsupervised Joint Learning of Depth, Optical Flow, Ego-motion from Video

Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robot's visual perception. We use a joint self-supervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available https://github.com/jianfenglihg/Unsupervised_geometry.

* 9 pages, 4 figures 
Viaarxiv icon

Single upper limb pose estimation method based on improved stacked hourglass network

Apr 16, 2020
Gang Peng, Yuezhi Zheng, Jianfeng Li, Jin Yang, Zhonghua Deng

Figure 1 for Single upper limb pose estimation method based on improved stacked hourglass network
Figure 2 for Single upper limb pose estimation method based on improved stacked hourglass network
Figure 3 for Single upper limb pose estimation method based on improved stacked hourglass network
Figure 4 for Single upper limb pose estimation method based on improved stacked hourglass network

At present, most high-accuracy single-person pose estimation methods have high computational complexity and insufficient real-time performance due to the complex structure of the network model. However, a single-person pose estimation method with high real-time performance also needs to improve its accuracy due to the simple structure of the network model. It is currently difficult to achieve both high accuracy and real-time performance in single-person pose estimation. For use in human-machine cooperative operations, this paper proposes a single-person upper limb pose estimation method based on an end-to-end approach for accurate and real-time limb pose estimation. Using the stacked hourglass network model, a single-person upper limb skeleton key point detection model was designed.Deconvolution was employed to replace the up-sampling operation of the hourglass module in the original model, solving the problem of rough feature maps. Integral regression was used to calculate the position coordinates of key points of the skeleton, reducing quantization errors and calculations. Experiments showed that the developed single-person upper limb skeleton key point detection model achieves high accuracy and that the pose estimation method based on the end-to-end approach provides high accuracy and real-time performance.

Viaarxiv icon

Occlusion Aware Unsupervised Learning of Optical Flow From Video

Mar 04, 2020
Jianfeng Li, Junqiao Zhao, Tiantian Feng, Chen Ye, Lu Xiong

Figure 1 for Occlusion Aware Unsupervised Learning of Optical Flow From Video
Figure 2 for Occlusion Aware Unsupervised Learning of Optical Flow From Video
Figure 3 for Occlusion Aware Unsupervised Learning of Optical Flow From Video
Figure 4 for Occlusion Aware Unsupervised Learning of Optical Flow From Video

In this paper, we proposed an unsupervised learning method for estimating the optical flow between video frames, especially to solve the occlusion problem. Occlusion is caused by the movement of an object or the movement of the camera, defined as when certain pixels are visible in one video frame but not in adjacent frames. Due to the lack of pixel correspondence between frames in the occluded area, incorrect photometric loss calculation can mislead the optical flow training process. In the video sequence, we found that the occlusion in the forward ($t\rightarrow t+1$) and backward ($t\rightarrow t-1$) frame pairs are usually complementary. That is, pixels that are occluded in subsequent frames are often not occluded in the previous frame and vice versa. Therefore, by using this complementarity, a new weighted loss is proposed to solve the occlusion problem. In addition, we calculate gradients in multiple directions to provide richer supervision information. Our method achieves competitive optical flow accuracy compared to the baseline and some supervised methods on KITTI 2012 and 2015 benchmarks. This source code has been released at https://github.com/jianfenglihg/UnOpticalFlow.git.

* 6 pages, 5 figures 
Viaarxiv icon

Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance

Aug 20, 2018
Yihui He, Xiaobo Ma, Xiapu Luo, Jianfeng Li, Mengchen Zhao, Bo An, Xiaohong Guan

Figure 1 for Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance
Figure 2 for Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance
Figure 3 for Vehicle Traffic Driven Camera Placement for Better Metropolis Security Surveillance

Security surveillance is one of the most important issues in smart cities, especially in an era of terrorism. Deploying a number of (video) cameras is a common surveillance approach. Given the never-ending power offered by vehicles to metropolises, exploiting vehicle traffic to design camera placement strategies could potentially facilitate security surveillance. This article constitutes the first effort toward building the linkage between vehicle traffic and security surveillance, which is a critical problem for smart cities. We expect our study could influence the decision making of surveillance camera placement, and foster more research of principled ways of security surveillance beneficial to our physical-world life. Code has been made publicly available.

* IEEE Intelligent Systems 
Viaarxiv icon

Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis

Jun 23, 2018
Jingyuan Wang, Ze Wang, Jianfeng Li, Junjie Wu

Figure 1 for Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis
Figure 2 for Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis
Figure 3 for Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis
Figure 4 for Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis

Recent years have witnessed the unprecedented rising of time series from almost all kindes of academic and industrial fields. Various types of deep neural network models have been introduced to time series analysis, but the important frequency information is yet lack of effective modeling. In light of this, in this paper we propose a wavelet-based neural network structure called multilevel Wavelet Decomposition Network (mWDN) for building frequency-aware deep learning models for time series analysis. mWDN preserves the advantage of multilevel discrete wavelet decomposition in frequency learning while enables the fine-tuning of all parameters under a deep neural network framework. Based on mWDN, we further propose two deep learning models called Residual Classification Flow (RCF) and multi-frequecy Long Short-Term Memory (mLSTM) for time series classification and forecasting, respectively. The two models take all or partial mWDN decomposed sub-series in different frequencies as input, and resort to the back propagation algorithm to learn all the parameters globally, which enables seamless embedding of wavelet-based frequency analysis into deep learning frameworks. Extensive experiments on 40 UCR datasets and a real-world user volume dataset demonstrate the excellent performance of our time series models based on mWDN. In particular, we propose an importance analysis method to mWDN based models, which successfully identifies those time-series elements and mWDN layers that are crucially important to time series analysis. This indeed indicates the interpretability advantage of mWDN, and can be viewed as an indepth exploration to interpretable deep learning.

Viaarxiv icon

LSTM Neural Reordering Feature for Statistical Machine Translation

Jun 16, 2016
Yiming Cui, Shijin Wang, Jianfeng Li

Figure 1 for LSTM Neural Reordering Feature for Statistical Machine Translation
Figure 2 for LSTM Neural Reordering Feature for Statistical Machine Translation
Figure 3 for LSTM Neural Reordering Feature for Statistical Machine Translation
Figure 4 for LSTM Neural Reordering Feature for Statistical Machine Translation

Artificial neural networks are powerful models, which have been widely applied into many aspects of machine translation, such as language modeling and translation modeling. Though notable improvements have been made in these areas, the reordering problem still remains a challenge in statistical machine translations. In this paper, we present a novel neural reordering model that directly models word pairs and alignment. By utilizing LSTM recurrent neural networks, much longer context could be learned for reordering prediction. Experimental results on NIST OpenMT12 Arabic-English and Chinese-English 1000-best rescoring task show that our LSTM neural reordering feature is robust and achieves significant improvements over various baseline systems.

* 6 pages, accepted by NAACL2016 short paper 
Viaarxiv icon