Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bohyung Han

Learning to Translate Noise for Robust Image Denoising

Dec 06, 2024

Inju Ha, Donghun Ryou, Seonguk Seo, Bohyung Han

Figure 1 for Learning to Translate Noise for Robust Image Denoising

Figure 2 for Learning to Translate Noise for Robust Image Denoising

Figure 3 for Learning to Translate Noise for Robust Image Denoising

Figure 4 for Learning to Translate Noise for Robust Image Denoising

Abstract:Deep learning-based image denoising techniques often struggle with poor generalization performance to out-of-distribution real-world noise. To tackle this challenge, we propose a novel noise translation framework that performs denoising on an image with translated noise rather than directly denoising an original noisy image. Specifically, our approach translates complex, unknown real-world noise into Gaussian noise, which is spatially uncorrelated and independent of image content, through a noise translation network. The translated noisy images are then processed by an image denoising network pretrained to effectively remove Gaussian noise, enabling robust and consistent denoising performance. We also design well-motivated loss functions and architectures for the noise translation network by leveraging the mathematical properties of Gaussian noise. Experimental results demonstrate that the proposed method substantially improves robustness and generalizability, outperforming state-of-the-art methods across diverse benchmarks. Visualized denoising results and the source code are available on our project page.

* The project page is available at https://hij1112.github.io/learning-to-translate-noise/

Via

Access Paper or Ask Questions

4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

Nov 13, 2024

Mijeong Kim, Jongwoo Lim, Bohyung Han

Figure 1 for 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

Figure 2 for 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

Figure 3 for 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

Figure 4 for 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

Abstract:Novel view synthesis of dynamic scenes is becoming important in various applications, including augmented and virtual reality. We propose a novel 4D Gaussian Splatting (4DGS) algorithm for dynamic scenes from casually recorded monocular videos. To overcome the overfitting problem of existing work for these real-world videos, we introduce an uncertainty-aware regularization that identifies uncertain regions with few observations and selectively imposes additional priors based on diffusion models and depth smoothness on such regions. This approach improves both the performance of novel view synthesis and the quality of training image reconstruction. We also identify the initialization problem of 4DGS in fast-moving dynamic regions, where the Structure from Motion (SfM) algorithm fails to provide reliable 3D landmarks. To initialize Gaussian primitives in such regions, we present a dynamic region densification method using the estimated depth maps and scene flow. Our experiments show that the proposed method improves the performance of 4DGS reconstruction from a video captured by a handheld monocular camera and also exhibits promising results in few-shot static scene reconstruction.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Nov 08, 2024

Jaeyoo Park, Jin Young Choi, Jeonghyung Park, Bohyung Han

Figure 1 for Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Figure 2 for Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Figure 3 for Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Figure 4 for Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding

Abstract:We present a novel OCR-free document understanding framework based on pretrained Multimodal Large Language Models (MLLMs). Our approach employs multi-scale visual features to effectively handle various font sizes within document images. To address the increasing costs of considering the multi-scale visual inputs for MLLMs, we propose the Hierarchical Visual Feature Aggregation (HVFA) module, designed to reduce the number of input tokens to LLMs. Leveraging a feature pyramid with cross-attentive pooling, our approach effectively manages the trade-off between information loss and efficiency without being affected by varying document image sizes. Furthermore, we introduce a novel instruction tuning task, which facilitates the model's text-reading capability by learning to predict the relative positions of input text, eventually minimizing the risk of truncated text caused by the limited capacity of LLMs. Comprehensive experiments validate the effectiveness of our approach, demonstrating superior performance in various document understanding tasks.

* NeurIPS 2024

Via

Access Paper or Ask Questions

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation

Sep 12, 2024

Junsung Lee, Minsoo Kang, Bohyung Han

Abstract:We propose a simple but effective training-free approach tailored to diffusion-based image-to-image translation. Our approach revises the original noise prediction network of a pretrained diffusion model by introducing a noise correction term. We formulate the noise correction term as the difference between two noise predictions; one is computed from the denoising network with a progressive interpolation of the source and target prompt embeddings, while the other is the noise prediction with the source prompt embedding. The final noise prediction network is given by a linear combination of the standard denoising term and the noise correction term, where the former is designed to reconstruct must-be-preserved regions while the latter aims to effectively edit regions of interest relevant to the target prompt. Our approach can be easily incorporated into existing image-to-image translation methods based on diffusion models. Extensive experiments verify that the proposed technique achieves outstanding performance with low latency and consistently improves existing frameworks when combined with them.

* 16 pages, 5 figures, 6 tables

Via

Access Paper or Ask Questions

Revisiting Machine Unlearning with Dimensional Alignment

Jul 25, 2024

Seonguk Seo, Dongwan Kim, Bohyung Han

Figure 1 for Revisiting Machine Unlearning with Dimensional Alignment

Figure 2 for Revisiting Machine Unlearning with Dimensional Alignment

Figure 3 for Revisiting Machine Unlearning with Dimensional Alignment

Figure 4 for Revisiting Machine Unlearning with Dimensional Alignment

Abstract:Machine unlearning, an emerging research topic focusing on compliance with data privacy regulations, enables trained models to remove the information learned from specific data. While many existing methods indirectly address this issue by intentionally injecting incorrect supervisions, they can drastically and unpredictably alter the decision boundaries and feature spaces, leading to training instability and undesired side effects. To fundamentally approach this task, we first analyze the changes in latent feature spaces between original and retrained models, and observe that the feature representations of samples not involved in training are closely aligned with the feature manifolds of previously seen samples in training. Based on these findings, we introduce a novel evaluation metric for machine unlearning, coined dimensional alignment, which measures the alignment between the eigenspaces of the forget and retain set samples. We employ this metric as a regularizer loss to build a robust and stable unlearning framework, which is further enhanced by integrating a self-distillation loss and an alternating training scheme. Our framework effectively eliminates information from the forget set and preserves knowledge from the retain set. Lastly, we identify critical flaws in established evaluation metrics for machine unlearning, and introduce new evaluation tools that more accurately reflect the fundamental goals of machine unlearning.

Via

Access Paper or Ask Questions

FIFO-Diffusion: Generating Infinite Videos from Text without Training

May 19, 2024

Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han

Figure 1 for FIFO-Diffusion: Generating Infinite Videos from Text without Training

Figure 2 for FIFO-Diffusion: Generating Infinite Videos from Text without Training

Figure 3 for FIFO-Diffusion: Generating Infinite Videos from Text without Training

Figure 4 for FIFO-Diffusion: Generating Infinite Videos from Text without Training

Abstract:We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner ones by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. We have demonstrated the promising results and effectiveness of the proposed methods on existing text-to-video generation baselines.

* Project Page: https://jjihwan.github.io/projects/FIFO-Diffusion

Via

Access Paper or Ask Questions

Leveraging Temporal Contextualization for Video Action Recognition

Apr 15, 2024

Minji Kim, Dongyoon Han, Taekyung Kim, Bohyung Han

Figure 1 for Leveraging Temporal Contextualization for Video Action Recognition

Figure 2 for Leveraging Temporal Contextualization for Video Action Recognition

Figure 3 for Leveraging Temporal Contextualization for Video Action Recognition

Figure 4 for Leveraging Temporal Contextualization for Video Action Recognition

Abstract:Pretrained vision-language models have shown effectiveness in video understanding. However, recent studies have not sufficiently leveraged essential temporal information from videos, simply averaging frame-wise representations or referencing consecutive frames. We introduce Temporally Contextualized CLIP (TC-CLIP), a pioneering framework for video understanding that effectively and efficiently leverages comprehensive video information. We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process. Furthermore, our Video-conditional Prompting (VP) module manufactures context tokens to generate informative prompts in text modality. We conduct extensive experiments in zero-shot, few-shot, base-to-novel, and fully-supervised action recognition to validate the superiority of our TC-CLIP. Ablation studies for TC and VP guarantee our design choices. Code is available at https://github.com/naver-ai/tc-clip

* 24 pages, 10 figures, 12 tables

Via

Access Paper or Ask Questions

A Training-Free Defense Framework for Robust Learned Image Compression

Jan 22, 2024

Myungseo Song, Jinyoung Choi, Bohyung Han

Figure 1 for A Training-Free Defense Framework for Robust Learned Image Compression

Figure 2 for A Training-Free Defense Framework for Robust Learned Image Compression

Figure 3 for A Training-Free Defense Framework for Robust Learned Image Compression

Figure 4 for A Training-Free Defense Framework for Robust Learned Image Compression

Abstract:We study the robustness of learned image compression models against adversarial attacks and present a training-free defense technique based on simple image transform functions. Recent learned image compression models are vulnerable to adversarial attacks that result in poor compression rate, low reconstruction quality, or weird artifacts. To address the limitations, we propose a simple but effective two-way compression algorithm with random input transforms, which is conveniently applicable to existing image compression models. Unlike the na\"ive approaches, our approach preserves the original rate-distortion performance of the models on clean images. Moreover, the proposed algorithm requires no additional training or modification of existing models, making it more practical. We demonstrate the effectiveness of the proposed techniques through extensive experiments under multiple compression models, evaluation metrics, and attack scenarios.

* 10 pages and 14 figures

Via

Access Paper or Ask Questions

Relaxed Contrastive Learning for Federated Learning

Jan 10, 2024

Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han

Figure 1 for Relaxed Contrastive Learning for Federated Learning

Figure 2 for Relaxed Contrastive Learning for Federated Learning

Figure 3 for Relaxed Contrastive Learning for Federated Learning

Figure 4 for Relaxed Contrastive Learning for Federated Learning

Abstract:We propose a novel contrastive learning framework to effectively address the challenges of data heterogeneity in federated learning. We first analyze the inconsistency of gradient updates across clients during local training and establish its dependence on the distribution of feature representations, leading to the derivation of the supervised contrastive learning (SCL) objective to mitigate local deviations. In addition, we show that a na\"ive adoption of SCL in federated learning leads to representation collapse, resulting in slow convergence and limited performance gains. To address this issue, we introduce a relaxed contrastive learning loss that imposes a divergence penalty on excessively similar sample pairs within each class. This strategy prevents collapsed representations and enhances feature transferability, facilitating collaborative training and leading to significant performance improvements. Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks through extensive experimental results.

Via

Access Paper or Ask Questions

Learning with Noisy Labels: Interconnection of Two Expectation-Maximizations

Jan 09, 2024

Heewon Kim, Hyun Sung Chang, Kiho Cho, Jaeyun Lee, Bohyung Han

Abstract:Labor-intensive labeling becomes a bottleneck in developing computer vision algorithms based on deep learning. For this reason, dealing with imperfect labels has increasingly gained attention and has become an active field of study. We address learning with noisy labels (LNL) problem, which is formalized as a task of finding a structured manifold in the midst of noisy data. In this framework, we provide a proper objective function and an optimization algorithm based on two expectation-maximization (EM) cycles. The separate networks associated with the two EM cycles collaborate to optimize the objective function, where one model is for distinguishing clean labels from corrupted ones while the other is for refurbishing the corrupted labels. This approach results in a non-collapsing LNL-flywheel model in the end. Experiments show that our algorithm achieves state-of-the-art performance in multiple standard benchmarks with substantial margins under various types of label noise.

Via

Access Paper or Ask Questions