Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seunghoon Hong

Simulation-Free Training of Neural ODEs on Paired Data

Oct 30, 2024

Semin Kim, Jaehoon Yoo, Jinwoo Kim, Yeonwoo Cha, Saehoon Kim, Seunghoon Hong

Abstract:In this work, we investigate a method for simulation-free training of Neural Ordinary Differential Equations (NODEs) for learning deterministic mappings between paired data. Despite the analogy of NODEs as continuous-depth residual networks, their application in typical supervised learning tasks has not been popular, mainly due to the large number of function evaluations required by ODE solvers and numerical instability in gradient estimation. To alleviate this problem, we employ the flow matching framework for simulation-free training of NODEs, which directly regresses the parameterized dynamics function to a predefined target velocity field. Contrary to generative tasks, however, we show that applying flow matching directly between paired data can often lead to an ill-defined flow that breaks the coupling of the data pairs (e.g., due to crossing trajectories). We propose a simple extension that applies flow matching in the embedding space of data pairs, where the embeddings are learned jointly with the dynamic function to ensure the validity of the flow which is also easier to learn. We demonstrate the effectiveness of our method on both regression and classification tasks, where our method outperforms existing NODEs with a significantly lower number of function evaluations. The code is available at https://github.com/seminkim/simulation-free-node.

Via

Access Paper or Ask Questions

Feature Augmentation based Test-Time Adaptation

Oct 18, 2024

Younggeol Cho, Youngrae Kim, Junho Yoon, Seunghoon Hong, Dongman Lee

Abstract:Test-time adaptation (TTA) allows a model to be adapted to an unseen domain without accessing the source data. Due to the nature of practical environments, TTA has a limited amount of data for adaptation. Recent TTA methods further restrict this by filtering input data for reliability, making the effective data size even smaller and limiting adaptation potential. To address this issue, We propose Feature Augmentation based Test-time Adaptation (FATA), a simple method that fully utilizes the limited amount of input data through feature augmentation. FATA employs Normalization Perturbation to augment features and adapts the model using the FATA loss, which makes the outputs of the augmented and original features similar. FATA is model-agnostic and can be seamlessly integrated into existing models without altering the model architecture. We demonstrate the effectiveness of FATA on various models and scenarios on ImageNet-C and Office-Home, validating its superiority in diverse real-world conditions.

* 10 pages

Via

Access Paper or Ask Questions

Revisiting Random Walks for Learning on Graphs

Jul 01, 2024

Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade, Youngmin Ryou, Seunghoon Hong

Figure 1 for Revisiting Random Walks for Learning on Graphs

Figure 2 for Revisiting Random Walks for Learning on Graphs

Figure 3 for Revisiting Random Walks for Learning on Graphs

Figure 4 for Revisiting Random Walks for Learning on Graphs

Abstract:We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approximation of graph functions in probability. A useful finding is that almost any kind of record of random walk guarantees probabilistic invariance as long as the vertices are anonymized. This enables us to record random walks in plain text and adopt a language model to read these text records to solve graph tasks. We further establish a parallelism to message passing neural networks using tools from Markov chain theory, and show that over-smoothing in message passing is alleviated by construction in random walk neural networks, while over-squashing manifests as probabilistic under-reaching. We show that random walk neural networks based on pre-trained language models can solve several hard problems on graphs, such as separating strongly regular graphs where the 3-WL test fails, counting substructures, and transductive classification on arXiv citation network without training. Code is available at https://github.com/jw9730/random-walk.

* 41 pages, 11 figures

Via

Access Paper or Ask Questions

Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

May 01, 2024

Whie Jung, Jaehoon Yoo, Sungjin Ahn, Seunghoon Hong

Figure 1 for Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Figure 2 for Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Figure 3 for Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Figure 4 for Learning to Compose: Improving Object Centric Learning by Injecting Compositionality

Abstract:Learning compositional representation is a key aspect of object-centric learning as it enables flexible systematic generalization and supports complex visual reasoning. However, most of the existing approaches rely on auto-encoding objective, while the compositionality is implicitly imposed by the architectural or algorithmic bias in the encoder. This misalignment between auto-encoding objective and learning compositionality often results in failure of capturing meaningful object representations. In this study, we propose a novel objective that explicitly encourages compositionality of the representations. Built upon the existing object-centric learning framework (e.g., slot attention), our method incorporates additional constraints that an arbitrary mixture of object representations from two images should be valid by maximizing the likelihood of the composite data. We demonstrate that incorporating our objective to the existing framework consistently improves the objective-centric learning and enhances the robustness to the architectural choices.

Via

Access Paper or Ask Questions

Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Apr 29, 2024

Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong

Figure 1 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 2 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 3 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Figure 4 for Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

Abstract:Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime is not straightforward and has received less attention from previous vision generalists. In this study, we explore a universal model that can flexibly adapt to unseen dense label structures with a few examples, enabling it to serve as a data-efficient vision generalist in diverse real-world scenarios. To this end, we base our method on a powerful meta-learning framework and explore several axes to improve its performance and versatility for real-world problems, such as flexible adaptation mechanisms and scalability. We evaluate our model across a spectrum of unseen real-world scenarios where low-shot learning is desirable, including video, 3D, medical, biological, and user-interactive tasks. Equipped with a generic architecture and an effective adaptation mechanism, our model flexibly adapts to all of these tasks with at most 50 labeled images, showcasing a significant advancement over existing data-efficient generalist approaches. Codes are available at https://github.com/GitGyun/chameleon.

Via

Access Paper or Ask Questions

Learning Symmetrization for Equivariance with Orbit Distance Minimization

Nov 13, 2023

Tien Dat Nguyen, Jinwoo Kim, Hongseok Yang, Seunghoon Hong

Figure 1 for Learning Symmetrization for Equivariance with Orbit Distance Minimization

Figure 2 for Learning Symmetrization for Equivariance with Orbit Distance Minimization

Figure 3 for Learning Symmetrization for Equivariance with Orbit Distance Minimization

Abstract:We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method's competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize.

* 16 pages, 1 figure

Via

Access Paper or Ask Questions

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Sep 08, 2023

Sungjun Cho, Dae-Woong Jeong, Sung Moon Ko, Jinwoo Kim, Sehui Han, Seunghoon Hong, Honglak Lee, Moontae Lee

Figure 1 for 3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Figure 2 for 3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Figure 3 for 3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Figure 4 for 3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Abstract:Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.

* 16 pages, 5 figures

Via

Access Paper or Ask Questions

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Jun 05, 2023

Jinwoo Kim, Tien Dat Nguyen, Ayhan Suleymanzade, Hyeokjun An, Seunghoon Hong

Figure 1 for Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Figure 2 for Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Figure 3 for Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Figure 4 for Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

Abstract:We present a novel framework to overcome the limitations of equivariant architectures in learning functions with group symmetries. In contrary to equivariant architectures, we use an arbitrary base model (such as an MLP or a transformer) and symmetrize it to be equivariant to the given group by employing a small equivariant network that parameterizes the probabilistic distribution underlying the symmetrization. The distribution is end-to-end trained with the base model which can maximize performance while reducing sample complexity of symmetrization. We show that this approach ensures not only equivariance to given group but also universal approximation capability in expectation. We implement our method on a simple patch-based transformer that can be initialized from pretrained vision transformers, and test it for a wide range of symmetry groups including permutation and Euclidean groups and their combinations. Empirical tests show competitive results against tailored equivariant architectures, suggesting the potential for learning equivariant functions for diverse groups using a non-equivariant universal base architecture. We further show evidence of enhanced learning in symmetric modalities, like graphs, when pretrained from non-symmetric modalities, like vision. Our implementation will be open-sourced at https://github.com/jw9730/lps.

* 25 pages, 3 figures

Via

Access Paper or Ask Questions

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Mar 27, 2023

Jaehoon Yoo, Semin Kim, Doyup Lee, Chiheon Kim, Seunghoon Hong

Abstract:Autoregressive transformers have shown remarkable success in video generation. However, the transformers are prohibited from directly learning the long-term dependency in videos due to the quadratic complexity of self-attention, and inherently suffering from slow inference time and error propagation due to the autoregressive process. In this paper, we propose Memory-efficient Bidirectional Transformer (MeBT) for end-to-end learning of long-term dependency in videos and fast inference. Based on recent advances in bidirectional transformers, our method learns to decode the entire spatio-temporal volume of a video in parallel from partially observed patches. The proposed transformer achieves a linear time complexity in both encoding and decoding, by projecting observable context tokens into a fixed number of latent tokens and conditioning them to decode the masked tokens through the cross-attention. Empowered by linear complexity and bidirectional modeling, our method demonstrates significant improvement over the autoregressive Transformers for generating moderately long videos in both quality and speed. Videos and code are available at https://sites.google.com/view/mebt-cvpr2023 .

Via

Access Paper or Ask Questions

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Mar 27, 2023

Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong

Figure 1 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 2 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 3 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Figure 4 for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Abstract:Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

Via

Access Paper or Ask Questions