Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaewon Lee

Jack

Domain Generalization Emerges from Dreaming

Feb 02, 2023

Hwan Heo, Youngjin Oh, Jaewon Lee, Hyunwoo J. Kim

Abstract:Recent studies have proven that DNNs, unlike human vision, tend to exploit texture information rather than shape. Such texture bias is one of the factors for the poor generalization performance of DNNs. We observe that the texture bias negatively affects not only in-domain generalization but also out-of-distribution generalization, i.e., Domain Generalization. Motivated by the observation, we propose a new framework to reduce the texture bias of a model by a novel optimization-based data augmentation, dubbed Stylized Dream. Our framework utilizes adaptive instance normalization (AdaIN) to augment the style of an original image yet preserve the content. We then adopt a regularization loss to predict consistent outputs between Stylized Dream and original images, which encourages the model to learn shape-based representations. Extensive experiments show that the proposed method achieves state-of-the-art performance in out-of-distribution settings on public benchmark datasets: PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet.

* 23 pages, 4 figures

Via

Access Paper or Ask Questions

Learning Local Implicit Fourier Representation for Image Warping

Jul 05, 2022

Jaewon Lee, Kwang Pyo Choi, Kyong Hwan Jin

Figure 1 for Learning Local Implicit Fourier Representation for Image Warping

Figure 2 for Learning Local Implicit Fourier Representation for Image Warping

Figure 3 for Learning Local Implicit Fourier Representation for Image Warping

Figure 4 for Learning Local Implicit Fourier Representation for Image Warping

Abstract:Image warping aims to reshape images defined on rectangular grids into arbitrary shapes. Recently, implicit neural functions have shown remarkable performances in representing images in a continuous manner. However, a standalone multi-layer perceptron suffers from learning high-frequency Fourier coefficients. In this paper, we propose a local texture estimator for image warping (LTEW) followed by an implicit neural representation to deform images into continuous shapes. Local textures estimated from a deep super-resolution (SR) backbone are multiplied by locally-varying Jacobian matrices of a coordinate transformation to predict Fourier responses of a warped image. Our LTEW-based neural function outperforms existing warping methods for asymmetric-scale SR and homography transform. Furthermore, our algorithm well generalizes arbitrary coordinate transformations, such as homography transform with a large magnification factor and equirectangular projection (ERP) perspective transform, which are not provided in training.

* ECCV 2022 camera-ready version (https://ipl.dgist.ac.kr/LTEW.pdf)

Via

Access Paper or Ask Questions

Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Jan 19, 2022

Zhongyi Lin, Louis Feng, Ehsan K. Ardestani, Jaewon Lee, John Lundell, Changkyu Kim, Arun Kejariwal, John D. Owens

Figure 1 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Figure 2 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Figure 3 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Figure 4 for Building a Performance Model for Deep Learning Recommendation Model Training on GPUs

Abstract:We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel runtimes) and the device idle time are important components of the overall device time. We therefore tackle them separately by (1) flexibly adopting heuristic-based and ML-based kernel performance models for operators that dominate the device active time, and (2) categorizing operator overheads into five types to determine quantitatively their contribution to the device active time. Combining these two parts, we propose a critical-path-based algorithm to predict the per-batch training time of DLRM by traversing its execution graph. We achieve less than 10% geometric mean average error (GMAE) in all kernel performance modeling, and 5.23% and 7.96% geomean errors for GPU active time and overall end-to-end per-batch training time prediction, respectively. We show that our general performance model not only achieves low prediction error on DLRM, which has highly customized configurations and is dominated by multiple factors, but also yields comparable accuracy on other compute-bound ML models targeted by most previous methods. Using this performance model and graph-level data and task dependency analyses, we show our system can provide more general model-system co-design than previous methods.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions

Local Texture Estimator for Implicit Representation Function

Nov 21, 2021

Jaewon Lee, Kyong Hwan Jin

Figure 1 for Local Texture Estimator for Implicit Representation Function

Figure 2 for Local Texture Estimator for Implicit Representation Function

Figure 3 for Local Texture Estimator for Implicit Representation Function

Figure 4 for Local Texture Estimator for Implicit Representation Function

Abstract:Recent works with an implicit neural function shed light on representing images in arbitrary resolution. However, a standalone multi-layer perceptron (MLP) shows limited performance in learning high-frequency components. In this paper, we propose a Local Texture Estimator (LTE), a dominant-frequency estimator for natural images, enabling an implicit function to capture fine details while reconstructing images in a continuous manner. When jointly trained with a deep super-resolution (SR) architecture, LTE is capable of characterizing image textures in 2D Fourier space. We show that an LTE-based neural function outperforms existing deep SR methods within an arbitrary-scale for all datasets and all scale factors. Furthermore, we demonstrate that our implementation takes the shortest running time compared to previous works. Source code will be open.

Via

Access Paper or Ask Questions

Point Cloud Augmentation with Weighted Local Transformations

Oct 11, 2021

Sihyeon Kim, Sanghyeok Lee, Dasol Hwang, Jaewon Lee, Seong Jae Hwang, Hyunwoo J. Kim

Figure 1 for Point Cloud Augmentation with Weighted Local Transformations

Figure 2 for Point Cloud Augmentation with Weighted Local Transformations

Figure 3 for Point Cloud Augmentation with Weighted Local Transformations

Figure 4 for Point Cloud Augmentation with Weighted Local Transformations

Abstract:Despite the extensive usage of point clouds in 3D vision, relatively limited data are available for training deep neural networks. Although data augmentation is a standard approach to compensate for the scarcity of data, it has been less explored in the point cloud literature. In this paper, we propose a simple and effective augmentation method called PointWOLF for point cloud augmentation. The proposed method produces smoothly varying non-rigid deformations by locally weighted transformations centered at multiple anchor points. The smooth deformations allow diverse and realistic augmentations. Furthermore, in order to minimize the manual efforts to search the optimal hyperparameters for augmentation, we present AugTune, which generates augmented samples of desired difficulties producing targeted confidence scores. Our experiments show our framework consistently improves the performance for both shape classification and part segmentation tasks. Particularly, with PointNet++, PointWOLF achieves the state-of-the-art 89.7 accuracy on shape classification with the real-world ScanObjectNN dataset.

* 9 pages, Accepted to ICCV 2021

Via

Access Paper or Ask Questions

Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Apr 08, 2021

Seo Taek Kong, Soomin Jeon, Jaewon Lee, Hongseok Lee, Kyu-Hwan Jung

Figure 1 for Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Figure 2 for Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Figure 3 for Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Figure 4 for Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape

Abstract:Deep learning (DL) relies on massive amounts of labeled data, and improving its labeled sample-efficiency remains one of the most important problems since its advent. Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts. Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data. Given the accessible pool of unlabeled data in pool-based AL, it seems natural to use SSL when training and AL to update the labeled set; however, algorithms designed for their combination remain limited. In this work, we first prove that convergence of gradient descent on sufficiently wide ReLU networks can be expressed in terms of their Gram matrix' eigen-spectrum. Equipped with a few theoretical insights, we propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set, by formulating an acquisition step in terms of improving training dynamics. Extensive experiments show that SSL algorithms coupled with CRC can achieve high performance using very few labeled data.

Via

Access Paper or Ask Questions

Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Jul 23, 2020

Kyungmin Lee, Chiyoun Park, Ilhwan Kim, Namhoon Kim, Jaewon Lee

Figure 1 for Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Figure 2 for Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Figure 3 for Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Figure 4 for Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR

Abstract:Recurrent Neural Network Language Models (RNNLMs) have started to be used in various fields of speech recognition due to their outstanding performance. However, the high computational complexity of RNNLMs has been a hurdle in applying the RNNLM to a real-time Large Vocabulary Continuous Speech Recognition (LVCSR). In order to accelerate the speed of RNNLM-based network searches during decoding, we apply the General Purpose Graphic Processing Units (GPGPUs). This paper proposes a novel method of applying GPGPUs to RNNLM-based graph traversals. We have achieved our goal by reducing redundant computations on CPUs and amount of transfer between GPGPUs and CPUs. The proposed approach was evaluated on both WSJ corpus and in-house data. Experiments shows that the proposed approach achieves the real-time speed in various circumstances while maintaining the Word Error Rate (WER) to be relatively 10% lower than that of n-gram models.

* 4 pages, 2 figures, Interspeech2015(Accepted)

Via

Access Paper or Ask Questions

Accelerating recurrent neural network language model based online speech recognition system

Jan 30, 2018

Kyungmin Lee, Chiyoun Park, Namhoon Kim, Jaewon Lee

Figure 1 for Accelerating recurrent neural network language model based online speech recognition system

Figure 2 for Accelerating recurrent neural network language model based online speech recognition system

Figure 3 for Accelerating recurrent neural network language model based online speech recognition system

Figure 4 for Accelerating recurrent neural network language model based online speech recognition system

Abstract:This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems. Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries. Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform. The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy. The performance of the proposed methods evaluated on LibriSpeech test sets indicates that the reduction in history vector precision improves the average recognition speed by 1.23 times with minimum degradation in accuracy. On the other hand, the CPU-GPU hybrid parallelization enables RNNLM based real-time recognition with a four times improvement in speed.

* 4 pages, 4 figures, 3 tables, ICASSP2018(Accepted)

Via

Access Paper or Ask Questions