Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiwan Seo

Bridging Geometric and Semantic Foundation Models for Generalized Monocular Depth Estimation

May 29, 2025

Sanggyun Ma, Wonjoon Choi, Jihun Park, Jaeyeul Kim, Seunghun Lee, Jiwan Seo, Sunghoon Im

Abstract:We present Bridging Geometric and Semantic (BriGeS), an effective method that fuses geometric and semantic information within foundation models to enhance Monocular Depth Estimation (MDE). Central to BriGeS is the Bridging Gate, which integrates the complementary strengths of depth and segmentation foundation models. This integration is further refined by our Attention Temperature Scaling technique. It finely adjusts the focus of the attention mechanisms to prevent over-concentration on specific features, thus ensuring balanced performance across diverse inputs. BriGeS capitalizes on pre-trained foundation models and adopts a strategy that focuses on training only the Bridging Gate. This method significantly reduces resource demands and training time while maintaining the model's ability to generalize effectively. Extensive experiments across multiple challenging datasets demonstrate that BriGeS outperforms state-of-the-art methods in MDE for complex scenes, effectively handling intricate structures and overlapping objects.

Via

Access Paper or Ask Questions

Context-Aware Video Instance Segmentation

Jul 03, 2024

Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im

Figure 1 for Context-Aware Video Instance Segmentation

Figure 2 for Context-Aware Video Instance Segmentation

Figure 3 for Context-Aware Video Instance Segmentation

Figure 4 for Context-Aware Video Instance Segmentation

Abstract:In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we introduce the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing instance matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, which is known for its particularly challenging videos.

* Project page: https://seung-hun-lee.github.io/projects/CAVIS/

Via

Access Paper or Ask Questions

RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

May 23, 2024

Jiwan Seo, Joonhyuk Kang

Figure 1 for RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

Figure 2 for RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

Figure 3 for RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

Figure 4 for RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

Abstract:Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in machine learning for learning discrete representations across various modalities. However, its scalability and applicability are limited by the need to retrain the model to adjust the codebook for different data or model scales. We introduce the Rate-Adaptive VQ-VAE (RAQ-VAE) framework, which addresses this challenge with two novel codebook representation methods: a model-based approach using a clustering-based technique on an existing well-trained VQ-VAE model, and a data-driven approach utilizing a sequence-to-sequence (Seq2Seq) model for variable-rate codebook generation. Our experiments demonstrate that RAQ-VAE achieves effective reconstruction performance across multiple rates, often outperforming conventional fixed-rate VQ-VAE models. This work enhances the adaptability and performance of VQ-VAEs, with broad applications in data reconstruction, generation, and computer vision tasks.

* Under review

Via

Access Paper or Ask Questions