Alert button
Picture for Yifan Zhu

Yifan Zhu

Alert button

Stanford University Department of Electrical Engineering

Human-Inspired Facial Sketch Synthesis with Dynamic Adaptation

Sep 01, 2023
Fei Gao, Yifan Zhu, Chang Jiang, Nannan Wang

Facial sketch synthesis (FSS) aims to generate a vivid sketch portrait from a given facial photo. Existing FSS methods merely rely on 2D representations of facial semantic or appearance. However, professional human artists usually use outlines or shadings to covey 3D geometry. Thus facial 3D geometry (e.g. depth map) is extremely important for FSS. Besides, different artists may use diverse drawing techniques and create multiple styles of sketches; but the style is globally consistent in a sketch. Inspired by such observations, in this paper, we propose a novel Human-Inspired Dynamic Adaptation (HIDA) method. Specially, we propose to dynamically modulate neuron activations based on a joint consideration of both facial 3D geometry and 2D appearance, as well as globally consistent style control. Besides, we use deformable convolutions at coarse-scales to align deep features, for generating abstract and distinct outlines. Experiments show that HIDA can generate high-quality sketches in multiple styles, and significantly outperforms previous methods, over a large range of challenging faces. Besides, HIDA allows precise style control of the synthesized sketch, and generalizes well to natural scenes and other artistic styles. Our code and results have been released online at: https://github.com/AiArt-HDU/HIDA.

* To appear on ICCV'23 
Viaarxiv icon

Continual Learning as Computationally Constrained Reinforcement Learning

Jul 10, 2023
Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.

Viaarxiv icon

Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections

Apr 18, 2023
Jiaxiong Qiu, Peng-Tao Jiang, Yifan Zhu, Ze-Xin Yin, Ming-Ming Cheng, Bo Ren

Figure 1 for Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
Figure 2 for Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
Figure 3 for Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections
Figure 4 for Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections

Neural implicit methods have achieved high-quality 3D object surfaces under slight specular highlights. However, high specular reflections (HSR) often appear in front of target objects when we capture them through glasses. The complex ambiguity in these scenes violates the multi-view consistency, then makes it challenging for recent methods to reconstruct target objects correctly. To remedy this issue, we present a novel surface reconstruction framework, NeuS-HSR, based on implicit neural rendering. In NeuS-HSR, the object surface is parameterized as an implicit signed distance function (SDF). To reduce the interference of HSR, we propose decomposing the rendered image into two appearances: the target object and the auxiliary plane. We design a novel auxiliary plane module by combining physical assumptions and neural networks to generate the auxiliary plane appearance. Extensive experiments on synthetic and real-world datasets demonstrate that NeuS-HSR outperforms state-of-the-art approaches for accurate and robust target surface reconstruction against HSR. Code is available at https://github.com/JiaxiongQ/NeuS-HSR.

* 17 pages, 20 figures 
Viaarxiv icon

Few-shot Adaptation for Manipulating Granular Materials Under Domain Shift

Mar 06, 2023
Yifan Zhu, Pranay Thangeda, Melkior Ornik, Kris Hauser

Figure 1 for Few-shot Adaptation for Manipulating Granular Materials Under Domain Shift
Figure 2 for Few-shot Adaptation for Manipulating Granular Materials Under Domain Shift
Figure 3 for Few-shot Adaptation for Manipulating Granular Materials Under Domain Shift
Figure 4 for Few-shot Adaptation for Manipulating Granular Materials Under Domain Shift

Autonomous lander missions on extraterrestrial bodies will need to sample granular material while coping with domain shift, no matter how well a sampling strategy is tuned on Earth. This paper proposes an adaptive scooping strategy that uses deep Gaussian process method trained with meta-learning to learn on-line from very limited experience on the target terrains. It introduces a novel meta-training approach, Deep Meta-Learning with Controlled Deployment Gaps (CoDeGa), that explicitly trains the deep kernel to predict scooping volume robustly under large domain shifts. Employed in a Bayesian Optimization sequential decision-making framework, the proposed method allows the robot to use vision and very little on-line experience to achieve high-quality scooping actions on out-of-distribution terrains, significantly outperforming non-adaptive methods proposed in the excavation literature as well as other state-of-the-art meta-learning methods. Moreover, a dataset of 6,700 executed scoops collected on a diverse set of materials, terrain topography, and compositions is made available for future research in granular material manipulation and meta-learning.

Viaarxiv icon

Is Stochastic Gradient Descent Near Optimal?

Oct 06, 2022
Yifan Zhu, Hong Jun Jeon, Benjamin Van Roy

Figure 1 for Is Stochastic Gradient Descent Near Optimal?
Figure 2 for Is Stochastic Gradient Descent Near Optimal?
Figure 3 for Is Stochastic Gradient Descent Near Optimal?
Figure 4 for Is Stochastic Gradient Descent Near Optimal?

The success of neural networks over the past decade has established them as effective models for many relevant data generating processes. Statistical theory on neural networks indicates graceful scaling of sample complexity. For example, Joen & Van Roy (arXiv:2203.00246) demonstrate that, when data is generated by a ReLU teacher network with $W$ parameters, an optimal learner needs only $\tilde{O}(W/\epsilon)$ samples to attain expected error $\epsilon$. However, existing computational theory suggests that, even for single-hidden-layer teacher networks, to attain small error for all such teacher networks, the computation required to achieve this sample complexity is intractable. In this work, we fit single-hidden-layer neural networks to data generated by single-hidden-layer ReLU teacher networks with parameters drawn from a natural distribution. We demonstrate that stochastic gradient descent (SGD) with automated width selection attains small expected error with a number of samples and total number of queries both nearly linear in the input dimension and width. This suggests that SGD nearly achieves the information-theoretic sample complexity bounds of Joen & Van Roy (arXiv:2203.00246) in a computationally efficient manner. An important difference between our positive empirical results and the negative theoretical results is that the latter address worst-case error of deterministic algorithms, while our analysis centers on expected error of a stochastic algorithm.

Viaarxiv icon

Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings

Sep 07, 2022
Jia Zheng, Yifan Zhu, Kehan Wang, Qiang Zou, Zihan Zhou

Figure 1 for Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings
Figure 2 for Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings
Figure 3 for Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings
Figure 4 for Deep Learning Assisted Optimization for 3D Reconstruction from Single 2D Line Drawings

In this paper, we revisit the long-standing problem of automatic reconstruction of 3D objects from single line drawings. Previous optimization-based methods can generate compact and accurate 3D models, but their success rates depend heavily on the ability to (i) identifying a sufficient set of true geometric constraints, and (ii) choosing a good initial value for the numerical optimization. In view of these challenges, we propose to train deep neural networks to detect pairwise relationships among geometric entities (i.e., edges) in the 3D object, and to predict initial depth value of the vertices. Our experiments on a large dataset of CAD models show that, by leveraging deep learning in a geometric constraint solving pipeline, the success rate of optimization-based 3D reconstruction can be significantly improved.

* Project page is at https://manycore-research.github.io/cstr 
Viaarxiv icon

AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query

Aug 15, 2022
Zepeng Huai, Zhe Wang, Yifan Zhu, Peng Zhang

Figure 1 for AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query
Figure 2 for AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query
Figure 3 for AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query
Figure 4 for AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query

Paper recommendation with user-generated keyword is to suggest papers that simultaneously meet user's interests and are relevant to the input keyword. This is a recommendation task with two queries, a.k.a. user ID and keyword. However, existing methods focus on recommendation according to one query, a.k.a. user ID, and are not applicable to solving this problem. In this paper, we propose a novel click-through rate (CTR) prediction model with heterogeneous graph neural network, called AMinerGNN, to recommend papers with two queries. Specifically, AMinerGNN constructs a heterogeneous graph to project user, paper, and keyword into the same embedding space by graph representation learning. To process two queries, a novel query attentive fusion layer is designed to recognize their importances dynamically and then fuse them as one query to build a unified and end-to-end recommender system. Experimental results on our proposed dataset and online A/B tests prove the superiority of AMinerGNN.

* CIKM 2022  
Viaarxiv icon

Excavation Reinforcement Learning Using Geometric Representation

Jan 27, 2022
Qingkai Lu, Yifan Zhu, Liangjun Zhang

Figure 1 for Excavation Reinforcement Learning Using Geometric Representation
Figure 2 for Excavation Reinforcement Learning Using Geometric Representation
Figure 3 for Excavation Reinforcement Learning Using Geometric Representation
Figure 4 for Excavation Reinforcement Learning Using Geometric Representation

Excavation of irregular rigid objects in clutter, such as fragmented rocks and wood blocks, is very challenging due to their complex interaction dynamics and highly variable geometries. In this paper, we adopt reinforcement learning (RL) to tackle this challenge and learn policies to plan for a sequence of excavation trajectories for irregular rigid objects, given point clouds of excavation scenes. Moreover, we separately learn a compact representation of the point cloud on geometric tasks that do not require human labeling. We show that using the representation reduces training time for RL, while achieving similar asymptotic performance compare to an end-to-end RL algorithm. When using a policy trained in simulation directly on a real scene, we show that the policy trained with the representation outperforms end-to-end RL. To our best knowledge, this paper presents the first application of RL to plan a sequence of excavation trajectories of irregular rigid objects in clutter.

Viaarxiv icon

Automated Heart and Lung Auscultation in Robotic Physical Examinations

Jan 24, 2022
Yifan Zhu, Alexander Smith, Kris Hauser

Figure 1 for Automated Heart and Lung Auscultation in Robotic Physical Examinations
Figure 2 for Automated Heart and Lung Auscultation in Robotic Physical Examinations
Figure 3 for Automated Heart and Lung Auscultation in Robotic Physical Examinations
Figure 4 for Automated Heart and Lung Auscultation in Robotic Physical Examinations

This paper presents the first implementation of autonomous robotic auscultation of heart and lung sounds. To select auscultation locations that generate high-quality sounds, a Bayesian Optimization (BO) formulation leverages visual anatomical cues to predict where high-quality sounds might be located, while using auditory feedback to adapt to patient-specific anatomical qualities. Sound quality is estimated online using machine learning models trained on a database of heart and lung stethoscope recordings. Experiments on 4 human subjects show that our system autonomously captures heart and lung sounds of similar quality compared to tele-operation by a human trained in clinical auscultation. Surprisingly, one of the subjects exhibited a previously unknown cardiac pathology that was first identified using our robot, which demonstrates the potential utility of autonomous robotic auscultation for health screening.

Viaarxiv icon