Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chen Fang

EnlightenGAN: Deep Light Enhancement without Paired Supervision

Jun 17, 2019

Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Figure 1 for EnlightenGAN: Deep Light Enhancement without Paired Supervision

Figure 2 for EnlightenGAN: Deep Light Enhancement without Paired Supervision

Figure 3 for EnlightenGAN: Deep Light Enhancement without Paired Supervision

Figure 4 for EnlightenGAN: Deep Light Enhancement without Paired Supervision

Abstract:Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. The code is available at \url{https://github.com/yueruchen/EnlightenGAN}

Via

Access Paper or Ask Questions

Multimodal Style Transfer via Graph Cuts

May 17, 2019

Yulun Zhang, Chen Fang, Yilin Wang, Zhaowen Wang, Zhe Lin, Yun Fu, Jimei Yang

Figure 1 for Multimodal Style Transfer via Graph Cuts

Figure 2 for Multimodal Style Transfer via Graph Cuts

Figure 3 for Multimodal Style Transfer via Graph Cuts

Figure 4 for Multimodal Style Transfer via Graph Cuts

Abstract:An assumption widely used in recent neural style transfer methods is that image styles can be described by global statics of deep features like Gram or covariance matrices. Alternative approaches have represented styles by decomposing them into local pixel or neural patches. Despite the recent progress, most existing methods treat the semantic patterns of style image uniformly, resulting unpleasing results on complex styles. In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). MST explicitly considers the matching of semantic patterns in content and style images. Specifically, the style image features are clustered into sub-style components, which are matched with local content features under a graph cut formulation. A reconstruction network is trained to transfer each sub-style and render the final stylized result. Extensive experiments demonstrate the superior effectiveness, robustness and flexibility of MST.

* Supplementary file: http://yulunzhang.com/papers/MST_supp_arXiv.pdf The MST source code will be available after the paper is published. Fix typos in Eq.(11) and (12)

Via

Access Paper or Ask Questions

Creative Procedural-Knowledge Extraction From Web Design Tutorials

Apr 18, 2019

Longqi Yang, Chen Fang, Hailin Jin, Walter Chang, Deborah Estrin

Figure 1 for Creative Procedural-Knowledge Extraction From Web Design Tutorials

Figure 2 for Creative Procedural-Knowledge Extraction From Web Design Tutorials

Figure 3 for Creative Procedural-Knowledge Extraction From Web Design Tutorials

Figure 4 for Creative Procedural-Knowledge Extraction From Web Design Tutorials

Abstract:Complex design tasks often require performing diverse actions in a specific order. To (semi-)autonomously accomplish these tasks, applications need to understand and learn a wide range of design procedures, i.e., Creative Procedural-Knowledge (CPK). Prior knowledge base construction and mining have not typically addressed the creative fields, such as design and arts. In this paper, we formalize an ontology of CPK using five components: goal, workflow, action, command and usage; and extract components' values from online design tutorials. We scraped 19.6K tutorial-related webpages and built a web application for professional designers to identify and summarize CPK components. The annotated dataset consists of 819 unique commands, 47,491 actions, and 2,022 workflows and goals. Based on this dataset, we propose a general CPK extraction pipeline and demonstrate that existing text classification and sequence-to-sequence models are limited in identifying, predicting and summarizing complex operations described in heterogeneous styles. Through quantitative and qualitative error analysis, we discuss CPK extraction challenges that need to be addressed by future research.

Via

Access Paper or Ask Questions

PaintBot: A Reinforcement Learning Approach for Natural Media Painting

Apr 03, 2019

Biao Jia, Chen Fang, Jonathan Brandt, Byungmoon Kim, Dinesh Manocha

Figure 1 for PaintBot: A Reinforcement Learning Approach for Natural Media Painting

Figure 2 for PaintBot: A Reinforcement Learning Approach for Natural Media Painting

Figure 3 for PaintBot: A Reinforcement Learning Approach for Natural Media Painting

Figure 4 for PaintBot: A Reinforcement Learning Approach for Natural Media Painting

Abstract:We propose a new automated digital painting framework, based on a painting agent trained through reinforcement learning. To synthesize an image, the agent selects a sequence of continuous-valued actions representing primitive painting strokes, which are accumulated on a digital canvas. Action selection is guided by a given reference image, which the agent attempts to replicate subject to the limitations of the action space and the agent's learned policy. The painting agent policy is determined using a variant of proximal policy optimization reinforcement learning. During training, our agent is presented with patches sampled from an ensemble of reference images. To accelerate training convergence, we adopt a curriculum learning strategy, whereby reference patches are sampled according to how challenging they are using the current policy. We experiment with differing loss functions, including pixel-wise and perceptual loss, which have consequent differing effects on the learned policy. We demonstrate that our painting agent can learn an effective policy with a high dimensional continuous action space comprising pen pressure, width, tilt, and color, for a variety of painting styles. Through a coarse-to-fine refinement process our agent can paint arbitrarily complex images in the desired style.

Via

Access Paper or Ask Questions

Dance Dance Generation: Motion Transfer for Internet Videos

Mar 30, 2019

Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

Figure 1 for Dance Dance Generation: Motion Transfer for Internet Videos

Figure 2 for Dance Dance Generation: Motion Transfer for Internet Videos

Figure 3 for Dance Dance Generation: Motion Transfer for Internet Videos

Figure 4 for Dance Dance Generation: Motion Transfer for Internet Videos

Abstract:This work presents computational methods for transferring body movements from one person to another with videos collected in the wild. Specifically, we train a personalized model on a single video from the Internet which can generate videos of this target person driven by the motions of other people. Our model is built on two generative networks: a human (foreground) synthesis net which generates photo-realistic imagery of the target person in a novel pose, and a fusion net which combines the generated foreground with the scene (background), adding shadows or reflections as needed to enhance realism. We validate the the efficacy of our proposed models over baselines with qualitative and quantitative evaluations as well as a subjective test.

Via

Access Paper or Ask Questions

Im2Pencil: Controllable Pencil Illustration from Photographs

Mar 20, 2019

Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang

Figure 1 for Im2Pencil: Controllable Pencil Illustration from Photographs

Figure 2 for Im2Pencil: Controllable Pencil Illustration from Photographs

Figure 3 for Im2Pencil: Controllable Pencil Illustration from Photographs

Figure 4 for Im2Pencil: Controllable Pencil Illustration from Photographs

Abstract:We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style. This is a challenging task due to multiple stroke types (e.g., outline and shading), structural complexity of pencil shading (e.g., hatching), and the lack of aligned training data pairs. To address these challenges, we develop a two-branch model that learns separate filters for generating sketchy outlines and tonal shading from a collection of pencil drawings. We create training data pairs by extracting clean outlines and tonal illustrations from original pencil drawings using image filtering techniques, and we manually label the drawing styles. In addition, our model creates different pencil styles (e.g., line sketchiness and shading style) in a user-controllable manner. Experimental results on different types of pencil drawings show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and user evaluations.

* Accepted by CVPR 2019

Via

Access Paper or Ask Questions

Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Oct 14, 2018

Tao Zhou, Chen Fang, Zhaowen Wang, Jimei Yang, Byungmoon Kim, Zhili Chen, Jonathan Brandt, Demetri Terzopoulos

Figure 1 for Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Figure 2 for Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Figure 3 for Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Figure 4 for Learning to Sketch with Deep Q Networks and Demonstrated Strokes

Abstract:Doodling is a useful and common intelligent skill that people can learn and master. In this work, we propose a two-stage learning framework to teach a machine to doodle in a simulated painting environment via Stroke Demonstration and deep Q-learning (SDQ). The developed system, Doodle-SDQ, generates a sequence of pen actions to reproduce a reference drawing and mimics the behavior of human painters. In the first stage, it learns to draw simple strokes by imitating in supervised fashion from a set of strokeaction pairs collected from artist paintings. In the second stage, it is challenged to draw real and more complex doodles without ground truth actions; thus, it is trained with Qlearning. Our experiments confirm that (1) doodling can be learned without direct stepby- step action supervision and (2) pretraining with stroke demonstration via supervised learning is important to improve performance. We further show that Doodle-SDQ is effective at producing plausible drawings in different media types, including sketch and watercolor.

Via

Access Paper or Ask Questions

Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Aug 26, 2018

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang

Figure 1 for Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Figure 2 for Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Figure 3 for Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Figure 4 for Flow-Grounded Spatial-Temporal Video Prediction from Still Images

Abstract:Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame. In this work, we study the problem of generating consecutive multiple future frames by observing one single still image only. We formulate the multi-frame prediction task as a multiple time step flow (multi-flow) prediction phase followed by a flow-to-frame synthesis phase. The multi-flow prediction is modeled in a variational probabilistic manner with spatial-temporal relationships learned through 3D convolutions. The flow-to-frame synthesis is modeled as a generative process in order to keep the predicted results lying closer to the manifold shape of real video sequence. Such a two-phase design prevents the model from directly looking at the high-dimensional pixel space of the frame sequence and is demonstrated to be more effective in predicting better and diverse results. Extensive experimental results on videos with different types of motion show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and human perceptual evaluation.

* Accepted by ECCV 2018

Via

Access Paper or Ask Questions

"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Jul 29, 2018

Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo

Figure 1 for "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Figure 2 for "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Figure 3 for "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Figure 4 for "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention

Abstract:Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. To this end, we first devise a new variant of LSTM, named style-factual LSTM, as the building block of our model. It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context. In addition, when we train the model to capture stylized elements, we propose an adaptive learning approach based on a reference factual model, it provides factual knowledge to the model as the model learns from stylized caption labels, and can adaptively compute how much information to supply at each time step. We evaluate our model on two stylized image captioning datasets, which contain humorous/romantic captions and positive/negative captions, respectively. Experiments shows that our proposed model outperforms the state-of-the-art approaches, without using extra ground truth supervision.

* 17 pages, 7 figures, ECCV 2018

Via

Access Paper or Ask Questions

Visual to Sound: Generating Natural Sound for Videos in the Wild

Jun 01, 2018

Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

Figure 1 for Visual to Sound: Generating Natural Sound for Videos in the Wild

Figure 2 for Visual to Sound: Generating Natural Sound for Videos in the Wild

Figure 3 for Visual to Sound: Generating Natural Sound for Videos in the Wild

Figure 4 for Visual to Sound: Generating Natural Sound for Videos in the Wild

Abstract:As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.

* Project page: http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.html

Via

Access Paper or Ask Questions