Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihan Zhou

Deep Depth from Focus with Differential Focus Volume

Dec 03, 2021
Fengting Yang, Xiaolei Huang, Zihan Zhou

Figure 1 for Deep Depth from Focus with Differential Focus Volume

Figure 2 for Deep Depth from Focus with Differential Focus Volume

Figure 3 for Deep Depth from Focus with Differential Focus Volume

Figure 4 for Deep Depth from Focus with Differential Focus Volume

Depth-from-focus (DFF) is a technique that infers depth using the focus change of a camera. In this work, we propose a convolutional neural network (CNN) to find the best-focused pixels in a focal stack and infer depth from the focus estimation. The key innovation of the network is the novel deep differential focus volume (DFV). By computing the first-order derivative with the stacked features over different focal distances, DFV is able to capture both the focus and context information for focus analysis. Besides, we also introduce a probability regression mechanism for focus estimation to handle sparsely sampled focal stacks and provide uncertainty estimation to the final prediction. Comprehensive experiments demonstrate that the proposed model achieves state-of-the-art performance on multiple datasets with good generalizability and fast speed.

* 17 pages

Via

Access Paper or Ask Questions

Temporal Induced Self-Play for Stochastic Bayesian Games

Aug 21, 2021
Weizhe Chen, Zihan Zhou, Yi Wu, Fei Fang

Figure 1 for Temporal Induced Self-Play for Stochastic Bayesian Games

Figure 2 for Temporal Induced Self-Play for Stochastic Bayesian Games

Figure 3 for Temporal Induced Self-Play for Stochastic Bayesian Games

Figure 4 for Temporal Induced Self-Play for Stochastic Bayesian Games

One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existing techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.

* In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021, pages 96-103

Via

Access Paper or Ask Questions

Towards Robust Human Trajectory Prediction in Raw Videos

Aug 18, 2021
Rui Yu, Zihan Zhou

Figure 1 for Towards Robust Human Trajectory Prediction in Raw Videos

Figure 2 for Towards Robust Human Trajectory Prediction in Raw Videos

Figure 3 for Towards Robust Human Trajectory Prediction in Raw Videos

Figure 4 for Towards Robust Human Trajectory Prediction in Raw Videos

Human trajectory prediction has received increased attention lately due to its importance in applications such as autonomous vehicles and indoor robots. However, most existing methods make predictions based on human-labeled trajectories and ignore the errors and noises in detection and tracking. In this paper, we study the problem of human trajectory forecasting in raw videos, and show that the prediction accuracy can be severely affected by various types of tracking errors. Accordingly, we propose a simple yet effective strategy to correct the tracking failures by enforcing prediction consistency over time. The proposed "re-tracking" algorithm can be applied to any existing tracking and prediction pipelines. Experiments on public benchmark datasets demonstrate that the proposed method can improve both tracking and prediction performance in challenging real-world scenarios. The code and data are available at https://git.io/retracking-prediction.

* 8 pages, 6 figures. Accepted by the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

Via

Access Paper or Ask Questions

Optimized multi-axis spiral projection MR fingerprinting with subspace reconstruction for rapid whole-brain high-isotropic-resolution quantitative imaging

Aug 12, 2021
Xiaozhi Cao, Congyu Liao, Siddharth Srinivasan Iyer, Zhixing Wang, Zihan Zhou, Erpeng Dai, Gilad Liberman, Zijing Dong, Ting Gong, Hongjian He, Jianhui Zhong, Berkin Bilgic, Kawin Setsompop

Figure 1 for Optimized multi-axis spiral projection MR fingerprinting with subspace reconstruction for rapid whole-brain high-isotropic-resolution quantitative imaging

Figure 2 for Optimized multi-axis spiral projection MR fingerprinting with subspace reconstruction for rapid whole-brain high-isotropic-resolution quantitative imaging

Purpose: To improve image quality and accelerate the acquisition of 3D MRF. Methods: Building on the multi-axis spiral-projection MRF technique, a subspace reconstruction with locally low rank (LLR) constraint and a modified spiral-projection spatiotemporal encoding scheme termed tiny-golden-angle-shuffling (TGAS) were implemented for rapid whole-brain high-resolution quantitative mapping. The LLR regularization parameter and the number of subspace bases were tuned using retrospective in-vivo data and simulated examinations, respectively. B0 inhomogeneity correction using multi-frequency interpolation was incorporated into the subspace reconstruction to further improve the image quality by mitigating blurring caused by off-resonance effect. Results: The proposed MRF acquisition and reconstruction framework can produce provide high quality 1-mm isotropic whole-brain quantitative maps in a total acquisition time of 1 minute 55 seconds, with higher-quality results than ones obtained from the previous approach in 6 minutes. The comparison of quantitative results indicates that neither the subspace reconstruction nor the TGAS trajectory induce bias for T1 and T2 mapping. High quality whole-brain MRF data were also obtained at 0.66-mm isotropic resolution in 4 minutes using the proposed technique, where the increased resolution was shown to improve visualization of subtle brain structures. Conclusion: The proposed TGAS-SPI-MRF with optimized spiral-projection trajectory and subspace reconstruction can enable high-resolution quantitative mapping with faster acquisition speed.

* 40 pages, 11 figures, 2 tables

Via

Access Paper or Ask Questions

Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Sep 24, 2020
Rui Yu, Zhenyuan Yuan, Minghui Zhu, Zihan Zhou

Figure 1 for Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Figure 2 for Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Figure 3 for Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Figure 4 for Data-Driven Distributed State Estimation and Behavior Modeling in Sensor Networks

Nowadays, the prevalence of sensor networks has enabled tracking of the states of dynamic objects for a wide spectrum of applications from autonomous driving to environmental monitoring and urban planning. However, tracking real-world objects often faces two key challenges: First, due to the limitation of individual sensors, state estimation needs to be solved in a collaborative and distributed manner. Second, the objects' movement behavior is unknown, and needs to be learned using sensor observations. In this work, for the first time, we formally formulate the problem of simultaneous state estimation and behavior learning in a sensor network. We then propose a simple yet effective solution to this new problem by extending the Gaussian process-based Bayes filters (GP-BayesFilters) to an online, distributed setting. The effectiveness of the proposed method is evaluated on tracking objects with unknown movement behaviors using both synthetic data and data collected from a multi-robot platform.

* 8 pages, 5 figures. To appear at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020)

Via

Access Paper or Ask Questions

Learning to Parse Wireframes in Images of Man-Made Environments

Jul 15, 2020
Kun Huang, Yifan Wang, Zihan Zhou, Tianjiao Ding, Shenghua Gao, Yi Ma

Figure 1 for Learning to Parse Wireframes in Images of Man-Made Environments

Figure 2 for Learning to Parse Wireframes in Images of Man-Made Environments

Figure 3 for Learning to Parse Wireframes in Images of Man-Made Environments

Figure 4 for Learning to Parse Wireframes in Images of Man-Made Environments

In this paper, we propose a learning-based approach to the task of automatically extracting a "wireframe" representation for images of cluttered man-made environments. The wireframe (see Fig. 1) contains all salient straight lines and their junctions of the scene that encode efficiently and accurately large-scale geometry and object shapes. To this end, we have built a very large new dataset of over 5,000 images with wireframes thoroughly labelled by humans. We have proposed two convolutional neural networks that are suitable for extracting junctions and lines with large spatial support, respectively. The networks trained on our dataset have achieved significantly better performance than state-of-the-art methods for junction detection and line segment detection, respectively. We have conducted extensive experiments to evaluate quantitatively and qualitatively the wireframes obtained by our method, and have convincingly shown that effectively and efficiently parsing wireframes for images of man-made environments is a feasible goal within reach. Such wireframes could benefit many important visual tasks such as feature correspondence, 3D reconstruction, vision-based mapping, localization, and navigation. The data and source code are available at https://github.com/huangkuns/wireframe.

* IEEE Conference on Computer Vision and Pattern Recognition (2018) 626-635
* CVPR 2018

Via

Access Paper or Ask Questions

Superpixel Segmentation with Fully Convolutional Networks

Mar 29, 2020
Fengting Yang, Qian Sun, Hailin Jin, Zihan Zhou

Figure 1 for Superpixel Segmentation with Fully Convolutional Networks

Figure 2 for Superpixel Segmentation with Fully Convolutional Networks

Figure 3 for Superpixel Segmentation with Fully Convolutional Networks

Figure 4 for Superpixel Segmentation with Fully Convolutional Networks

In computer vision, superpixels have been widely used as an effective way to reduce the number of image primitives for subsequent processing. But only a few attempts have been made to incorporate them into deep neural networks. One main reason is that the standard convolution operation is defined on regular grids and becomes inefficient when applied to superpixels. Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid. Experimental results on benchmark datasets show that our method achieves state-of-the-art superpixel segmentation performance while running at about 50fps. Based on the predicted superpixels, we further develop a downsampling/upsampling scheme for deep networks with the goal of generating high-resolution outputs for dense prediction tasks. Specifically, we modify a popular network architecture for stereo matching to simultaneously predict superpixels and disparities. We show that improved disparity estimation accuracy can be obtained on public datasets.

* 16 pages, 15 figures, to be published in CVPR'20

Via

Access Paper or Ask Questions

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Mar 23, 2020
Qian Long, Zihan Zhou, Abhibav Gupta, Fei Fang, Yi Wu, Xiaolong Wang

Figure 1 for Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Figure 2 for Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Figure 3 for Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Figure 4 for Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially.

* The project page is https://sites.google.com/view/epciclr2020 .The source code is released at https://github.com/qian18long/epciclr2020

Via

Access Paper or Ask Questions

Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Dec 09, 2019
Yuan Xue, Zihan Zhou, Xiaolei Huang

Figure 1 for Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Figure 2 for Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Figure 3 for Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Figure 4 for Learning Structure-Appearance Joint Embedding for Indoor Scene Image Synthesis

Advanced image synthesis methods can generate photo-realistic images for faces, birds, bedrooms, and more. However, these methods do not explicitly model and preserve essential structural constraints such as junctions, parallel lines, and planar surfaces. In this paper, we study the problem of structured indoor image generation for design applications. We utilize a small-scale dataset that contains both images of various indoor scenes and their corresponding ground-truth wireframe annotations. While existing image synthesis models trained on the dataset are insufficient in preserving structural integrity, we propose a novel model based on a structure-appearance joint embedding learned from both images and wireframes. In our model, structural constraints are explicitly enforced by learning a joint embedding in a shared encoder network that must support the generation of both images and wireframes. We demonstrate the effectiveness of the joint embedding learning scheme on the indoor scene wireframe to image translation task. While wireframes as input contain less semantic information than inputs of other traditional image translation tasks, our model can generate high fidelity indoor scene renderings that match well with input wireframes. Experiments on a wireframe-scene dataset show that our proposed translation model significantly outperforms existing state-of-the-art methods in both visual quality and structural integrity of generated images.

Via

Access Paper or Ask Questions

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Aug 01, 2019
Jia Zheng, Junfei Zhang, Jing Li, Rui Tang, Shenghua Gao, Zihan Zhou

Figure 1 for Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Figure 2 for Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Figure 3 for Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Figure 4 for Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Recently, there has been growing interest in developing learning-based methods to detect and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids, smooth surfaces, and all types of symmetries, for 3D scene modeling and understanding. However, the ground truth annotations are often obtained via human labor, which is particularly challenging and inefficient for such tasks due to the large number of 3D structure instances (e.g., line segments) and other factors such as viewpoints and occlusions. In this paper, we present a new synthetic dataset, Structured3D, with the aim to providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks. We take advantage of the availability of millions of professional interior designs and automatically extract 3D structures from them. We generate high-quality images with an industry-leading rendering engine. We use our synthetic dataset in combination with real images to train deep neural networks for room layout estimation and demonstrate improved performance on benchmark datasets.

* Project website: https://structured3d-dataset.org

Via

Access Paper or Ask Questions