Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junsik Kim

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Aug 17, 2024

Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

Abstract:Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency.

* Accepted by MICCAI 2024

Via

Access Paper or Ask Questions

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

Jul 18, 2024

Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

Abstract:Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events, such as silent objects or off-screen sounds. In this paper, we first comprehensively examine the cross-modal interaction of existing methods, benchmarks, evaluation metrics, and cross-modal understanding tasks. Then, we identify the limitations of previous studies and make several contributions to overcome the limitations. First, we introduce a new synthetic benchmark for interactive sound source localization. Second, we introduce new evaluation metrics to rigorously assess sound source localization methods, focusing on accurately evaluating both localization performance and cross-modal interaction ability. Third, we propose a learning framework with a cross-modal alignment strategy to enhance cross-modal interaction. Lastly, we evaluate both interactive sound source localization and auxiliary cross-modal retrieval tasks together to thoroughly assess cross-modal interaction capabilities and benchmark competing methods. Our new benchmarks and evaluation metrics reveal previously overlooked issues in sound source localization studies. Our proposed novel method, with enhanced cross-modal alignment, shows superior sound source localization performance. This work provides the most comprehensive analysis of sound source localization to date, with extensive validation of competing methods on both existing and new benchmarks using new and standard evaluation metrics.

* Journal Extension of ICCV 2023 paper (arXiV:2309.10724). Code is available at https://github.com/kaistmm/SSLalignment

Via

Access Paper or Ask Questions

Joint-Task Regularization for Partially Labeled Multi-Task Learning

Apr 02, 2024

Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister

Figure 1 for Joint-Task Regularization for Partially Labeled Multi-Task Learning

Figure 2 for Joint-Task Regularization for Partially Labeled Multi-Task Learning

Figure 3 for Joint-Task Regularization for Partially Labeled Multi-Task Learning

Figure 4 for Joint-Task Regularization for Partially Labeled Multi-Task Learning

Abstract:Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, especially for dense prediction tasks which require per-pixel labels for each image. With this in mind, we propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space to improve learning when data is not fully labeled for all tasks. JTR stands out from existing approaches in that it regularizes all tasks jointly rather than separately in pairs -- therefore, it achieves linear complexity relative to the number of tasks while previous methods scale quadratically. To demonstrate the validity of our approach, we extensively benchmark our method across a wide variety of partially labeled scenarios based on NYU-v2, Cityscapes, and Taskonomy.

* Accepted paper to CVPR 2024 (main conference)

Via

Access Paper or Ask Questions

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

Mar 31, 2024

Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

Abstract:Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already shows great potential for fine-grained spatial-temporal modeling, as each layer offers distinct yet useful information under different granularity levels. Motivated by this, we propose Reversed Recurrent Tuning ($R^2$-Tuning), a parameter- and memory-efficient transfer learning framework for video temporal grounding. Our method learns a lightweight $R^2$ Block containing only 1.5% of the total parameters to perform progressive spatial-temporal modeling. Starting from the last layer of CLIP, $R^2$ Block recurrently aggregates spatial features from earlier layers, then refines temporal correlation conditioning on the given query, resulting in a coarse-to-fine scheme. $R^2$-Tuning achieves state-of-the-art performance across three VTG tasks (i.e., moment retrieval, highlight detection, and video summarization) on six public benchmarks (i.e., QVHighlights, Charades-STA, Ego4D-NLQ, TACoS, YouTube Highlights, and TVSum) even without the additional backbone, demonstrating the significance and effectiveness of the proposed scheme. Our code is available at https://github.com/yeliudev/R2-Tuning.

Via

Access Paper or Ask Questions

Blurry Video Compression: A Trade-off between Visual Enhancement and Data Compression

Nov 08, 2023

Dawit Mureja Argaw, Junsik Kim, In So Kweon

Abstract:Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video while preserving its quality. In this regard, previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed which often result in sharp videos. However, when these methods are evaluated on videos captured under different temporal priors, which lead to degradations like motion blur and low frame rate, they fail to maintain the quality of the contents. In this work, we tackle the VC problem in a general scenario where a given video can be blurry due to predefined camera settings or dynamics in the scene. By exploiting the natural trade-off between visual enhancement and data compression, we formulate VC as a min-max optimization problem and propose an effective framework and training strategy to tackle the problem. Extensive experimental results on several benchmark datasets confirm the effectiveness of our method compared to several state-of-the-art VC approaches.

* Accepted to WACV 2024

Via

Access Paper or Ask Questions

Sound Source Localization is All about Cross-Modal Alignment

Sep 19, 2023

Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

Figure 1 for Sound Source Localization is All about Cross-Modal Alignment

Figure 2 for Sound Source Localization is All about Cross-Modal Alignment

Figure 3 for Sound Source Localization is All about Cross-Modal Alignment

Figure 4 for Sound Source Localization is All about Cross-Modal Alignment

Abstract:Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for genuine sound source localization. Cross-modal semantic understanding is important in understanding semantically mismatched audio-visual events, e.g., silent objects, or off-screen sounds. To account for this, we propose a cross-modal alignment task as a joint task with sound source localization to better learn the interaction between audio and visual modalities. Thereby, we achieve high localization performance with strong cross-modal semantic understanding. Our method outperforms the state-of-the-art approaches in both sound source localization and cross-modal retrieval. Our work suggests that jointly tackling both tasks is necessary to conquer genuine sound source localization.

* ICCV 2023

Via

Access Paper or Ask Questions

An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

Sep 18, 2023

Minkyung Kim, Jongmin Yu, Junsik Kim, Tae-Hyun Oh, Jun Kyun Choi

Figure 1 for An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

Figure 2 for An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

Figure 3 for An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

Figure 4 for An Iterative Method for Unsupervised Robust Anomaly Detection Under Data Contamination

Abstract:Most deep anomaly detection models are based on learning normality from datasets due to the difficulty of defining abnormality by its diverse and inconsistent nature. Therefore, it has been a common practice to learn normality under the assumption that anomalous data are absent in a training dataset, which we call normality assumption. However, in practice, the normality assumption is often violated due to the nature of real data distributions that includes anomalous tails, i.e., a contaminated dataset. Thereby, the gap between the assumption and actual training data affects detrimentally in learning of an anomaly detection model. In this work, we propose a learning framework to reduce this gap and achieve better normality representation. Our key idea is to identify sample-wise normality and utilize it as an importance weight, which is updated iteratively during the training. Our framework is designed to be model-agnostic and hyperparameter insensitive so that it applies to a wide range of existing methods without careful parameter tuning. We apply our framework to three different representative approaches of deep anomaly detection that are classified into one-class classification-, probabilistic model-, and reconstruction-based approaches. In addition, we address the importance of a termination condition for iterative methods and propose a termination criterion inspired by the anomaly detection objective. We validate that our framework improves the robustness of the anomaly detection models under different levels of contamination ratios on five anomaly detection benchmark datasets and two image datasets. On various contaminated datasets, our framework improves the performance of three representative anomaly detection methods, measured by area under the ROC curve.

* IEEE Transactions on Neural Networks and Learning Systems, 2023

Via

Access Paper or Ask Questions

Active anomaly detection based on deep one-class classification

Sep 18, 2023

Minkyung Kim, Junsik Kim, Jongmin Yu, Jun Kyun Choi

Figure 1 for Active anomaly detection based on deep one-class classification

Figure 2 for Active anomaly detection based on deep one-class classification

Figure 3 for Active anomaly detection based on deep one-class classification

Figure 4 for Active anomaly detection based on deep one-class classification

Abstract:Active learning has been utilized as an efficient tool in building anomaly detection models by leveraging expert feedback. In an active learning framework, a model queries samples to be labeled by experts and re-trains the model with the labeled data samples. It unburdens in obtaining annotated datasets while improving anomaly detection performance. However, most of the existing studies focus on helping experts identify as many abnormal data samples as possible, which is a sub-optimal approach for one-class classification-based deep anomaly detection. In this paper, we tackle two essential problems of active learning for Deep SVDD: query strategy and semi-supervised learning method. First, rather than solely identifying anomalies, our query strategy selects uncertain samples according to an adaptive boundary. Second, we apply noise contrastive estimation in training a one-class classification model to incorporate both labeled normal and abnormal data effectively. We analyze that the proposed query strategy and semi-supervised loss individually improve an active learning process of anomaly detection and further improve when combined together on seven anomaly detection datasets.

* Volume 167, March 2023, Pages 18-24
* Pattern Recognition Letters 2023

Via

Access Paper or Ask Questions

Focus or Not: A Baseline for Anomaly Event Detection On the Open Public Places with Satellite Images

Mar 21, 2023

Yongjin Jeon, Youngtack Oh, Doyoung Jeong, Hyunguk Choi, Junsik Kim

Abstract:In recent years, monitoring the world wide area with satellite images has been emerged as an important issue. Site monitoring task can be divided into two independent tasks; 1) Change Detection and 2) Anomaly Event Detection. Unlike to change detection research is actively conducted based on the numerous datasets(\eg LEVIR-CD, WHU-CD, S2Looking, xView2 and etc...) to meet up the expectations of industries or governments, research on AI models for detecting anomaly events is passively and rarely conducted. In this paper, we introduce a novel satellite imagery dataset(AED-RS) for detecting anomaly events on the open public places. AED-RS Dataset contains satellite images of normal and abnormal situations of 8 open public places from all over the world. Each places are labeled with different criteria based on the difference of characteristics of each places. With this dataset, we introduce a baseline model for our dataset TB-FLOW, which can be trained in weakly-supervised manner and shows reasonable performance on the AED-RS Dataset compared with the other NF(Normalizing-Flow) based anomaly detection models. Our dataset and code will be publicly open in \url{https://github.com/SIAnalytics/RS_AnomalyDetection.git}.

* 14 pages, 14 figures

Via

Access Paper or Ask Questions

ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Feb 20, 2023

Moon Ye-Bin, Dongmin Choi, Yongjin Kwon, Junsik Kim, Tae-Hyun Oh

Figure 1 for ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Figure 2 for ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Figure 3 for ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Figure 4 for ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation

Abstract:We address a weakly-supervised low-shot instance segmentation, an annotation-efficient training method to deal with novel classes effectively. Since it is an under-explored problem, we first investigate the difficulty of the problem and identify the performance bottleneck by conducting systematic analyses of model components and individual sub-tasks with a simple baseline model. Based on the analyses, we propose ENInst with sub-task enhancement methods: instance-wise mask refinement for enhancing pixel localization quality and novel classifier composition for improving classification accuracy. Our proposed method lifts the overall performance by enhancing the performance of each sub-task. We demonstrate that our ENInst is 7.5 times more efficient in achieving comparable performance to the existing fully-supervised few-shot models and even outperforms them at times.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions