Alert button
Picture for Geondo Park

Geondo Park

Alert button

GeNAS: Neural Architecture Search with Better Generalization

May 18, 2023
Joonhyun Jeong, Joonsang Yu, Geondo Park, Dongyoon Han, YoungJoon Yoo

Figure 1 for GeNAS: Neural Architecture Search with Better Generalization
Figure 2 for GeNAS: Neural Architecture Search with Better Generalization
Figure 3 for GeNAS: Neural Architecture Search with Better Generalization
Figure 4 for GeNAS: Neural Architecture Search with Better Generalization

Neural Architecture Search (NAS) aims to automatically excavate the optimal network architecture with superior test performance. Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data. In this paper, we investigate a new neural architecture search measure for excavating architectures with better generalization. We demonstrate that the flatness of the loss surface can be a promising proxy for predicting the generalization capability of neural network architectures. We evaluate our proposed method on various search spaces, showing similar or even better performance compared to the state-of-the-art NAS methods. Notably, the resultant architecture found by flatness measure generalizes robustly to various shifts in data distribution (e.g. ImageNet-V2,-A,-O), as well as various tasks such as object detection and semantic segmentation. Code is available at https://github.com/clovaai/GeNAS.

* Accepted by IJCAI2023 
Viaarxiv icon

Distilling Linguistic Context for Language Model Compression

Sep 17, 2021
Geondo Park, Gyeongman Kim, Eunho Yang

Figure 1 for Distilling Linguistic Context for Language Model Compression
Figure 2 for Distilling Linguistic Context for Language Model Compression
Figure 3 for Distilling Linguistic Context for Language Model Compression
Figure 4 for Distilling Linguistic Context for Language Model Compression

A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. Unlike other recent distillation techniques for the language models, our contextual distillation does not have any restrictions on architectural changes between teacher and student. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks, not only in architectures of various sizes, but also in combination with DynaBERT, the recently proposed adaptive size pruning method.

* EMNLP 2021. Code: https://github.com/GeondoPark/CKD 
Viaarxiv icon

Attribution Preservation in Network Compression for Reliable Network Interpretation

Oct 28, 2020
Geondo Park, June Yong Yang, Sung Ju Hwang, Eunho Yang

Figure 1 for Attribution Preservation in Network Compression for Reliable Network Interpretation
Figure 2 for Attribution Preservation in Network Compression for Reliable Network Interpretation
Figure 3 for Attribution Preservation in Network Compression for Reliable Network Interpretation
Figure 4 for Attribution Preservation in Network Compression for Reliable Network Interpretation

Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.

* NeurIPS 2020. Code: https://github.com/GeondoPark/attribute-preserve 
Viaarxiv icon

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Jan 11, 2020
Geondo Park, Chihye Han, Wonjun Yoon, Daeshik Kim

Figure 1 for MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Figure 2 for MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Figure 3 for MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding
Figure 4 for MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Visual-semantic embedding enables various tasks such as image-text retrieval, image captioning, and visual question answering. The key to successful visual-semantic embedding is to express visual and textual data properly by accounting for their intricate relationship. While previous studies have achieved much advance by encoding the visual and textual data into a joint space where similar concepts are closely located, they often represent data by a single vector ignoring the presence of multiple important components in an image or text. Thus, in addition to the joint embedding space, we propose a novel multi-head self-attention network to capture various components of visual and textual data by attending to important parts in data. Our approach achieves the new state-of-the-art results in image-text retrieval tasks on MS-COCO and Flicker30K datasets. Through the visualization of the attention maps that capture distinct semantic components at multiple positions in the image and the text, we demonstrate that our method achieves an effective and interpretable visual-semantic joint space.

* Accepted by the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV 20), 9 pages, 5 figures 
Viaarxiv icon