Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seunghyun Lee

MARIC: Multi-Agent Reasoning for Image Classification

Sep 18, 2025

Wonduk Seo, Minhyeong Yu, Hyunjin An, Seunghyun Lee

Abstract:Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

* Preprint

Via

Access Paper or Ask Questions

CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection

Jun 13, 2025

Byeongchan Lee, John Won, Seunghyun Lee, Jinwoo Shin

Abstract:Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types (e.g., local and global defect), and the scarcity of training data. As such, it necessitates a comprehensive model capable of capturing both low-level and high-level features, even with limited data. To address this, we propose CLIPFUSION, a method that leverages both discriminative and generative foundation models. Specifically, the CLIP-based discriminative model excels at capturing global features, while the diffusion-based generative model effectively captures local details, creating a synergistic and complementary approach. Notably, we introduce a methodology for utilizing cross-attention maps and feature maps extracted from diffusion models specifically for anomaly detection. Experimental results on benchmark datasets (MVTec-AD, VisA) demonstrate that CLIPFUSION consistently outperforms baseline methods, achieving outstanding performance in both anomaly segmentation and classification. We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection, providing a scalable solution for real-world applications.

Via

Access Paper or Ask Questions

Identifiability of latent causal graphical models without pure children

May 23, 2025

Seunghyun Lee, Yuqi Gu

Abstract:This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the latent causal graph. Furthermore, it is common for all observed variables to exhibit the same modality. Consequently, the existing identifiability conditions are often too stringent for complex real-world data. We consider a general nonparametric measurement model with arbitrary observed variable types and binary latent variables, and propose a double triangular graphical condition that guarantees identifiability of the entire causal graphical model. The proposed condition significantly relaxes the popular pure children condition. We also establish necessary conditions for identifiability and provide valuable insights into fundamental limits of identifiability. Simulation studies verify that latent structures satisfying our conditions can be accurately estimated from data.

Via

Access Paper or Ask Questions

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

May 21, 2025

Dasol Choi, Seunghyun Lee, Youngsook Song

Abstract:Vision-Language Models (VLMs) have demonstrated impressive capabilities in understanding visual content, but their reliability in safety-critical contexts remains under-explored. We introduce VERI (Visual Emergency Recognition Dataset), a carefully designed diagnostic benchmark of 200 images (100 contrastive pairs). Each emergency scene is matched with a visually similar but safe counterpart through multi-stage human verification and iterative refinement. Using a two-stage protocol - risk identification and emergency response - we evaluate 14 VLMs (2B-124B parameters) across medical emergencies, accidents, and natural disasters. Our analysis reveals a systematic overreaction problem: models excel at identifying real emergencies (70-100 percent success rate) but suffer from an alarming rate of false alarms, misidentifying 31-96 percent of safe situations as dangerous, with 10 scenarios failed by all models regardless of scale. This "better-safe-than-sorry" bias manifests primarily through contextual overinterpretation (88-93 percent of errors), challenging VLMs' reliability for safety applications. These findings highlight persistent limitations that are not resolved by increasing model scale, motivating targeted approaches for improving contextual safety assessment in visually misleading scenarios.

* 13 pages

Via

Access Paper or Ask Questions

Personalizing Federated Learning for Hierarchical Edge Networks with Non-IID Data

Apr 11, 2025

Seunghyun Lee, Omid Tavallaie, Shuaijun Chen, Kanchana Thilakarathna, Suranga Seneviratne, Adel Nadjaran Toosi, Albert Y. Zomaya

Abstract:Accommodating edge networks between IoT devices and the cloud server in Hierarchical Federated Learning (HFL) enhances communication efficiency without compromising data privacy. However, devices connected to the same edge often share geographic or contextual similarities, leading to varying edge-level data heterogeneity with different subsets of labels per edge, on top of device-level heterogeneity. This hierarchical non-Independent and Identically Distributed (non-IID) nature, which implies that each edge has its own optimization goal, has been overlooked in HFL research. Therefore, existing edge-accommodated HFL demonstrates inconsistent performance across edges in various hierarchical non-IID scenarios. To ensure robust performance with diverse edge-level non-IID data, we propose a Personalized Hierarchical Edge-enabled Federated Learning (PHE-FL), which personalizes each edge model to perform well on the unique class distributions specific to each edge. We evaluated PHE-FL across 4 scenarios with varying levels of edge-level non-IIDness, with extreme IoT device level non-IIDness. To accurately assess the effectiveness of our personalization approach, we deployed test sets on each edge server instead of the cloud server, and used both balanced and imbalanced test sets. Extensive experiments show that PHE-FL achieves up to 83 percent higher accuracy compared to existing federated learning approaches that incorporate edge networks, given the same number of training rounds. Moreover, PHE-FL exhibits improved stability, as evidenced by reduced accuracy fluctuations relative to the state-of-the-art FedAvg with two-level (edge and cloud) aggregation.

Via

Access Paper or Ask Questions

QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval

Feb 12, 2025

Wonduk Seo, Seunghyun Lee

Figure 1 for QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval

Figure 2 for QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval

Figure 3 for QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval

Figure 4 for QA-Expand: Multi-Question Answer Generation for Enhanced Query Expansion in Information Retrieval

Abstract:Query expansion is widely used in Information Retrieval (IR) to improve search outcomes by enriching queries with additional contextual information. Although recent Large Language Model (LLM) based methods generate pseudo-relevant content and expanded terms via multiple prompts, they often yield repetitive, narrow expansions that lack the diverse context needed to retrieve all relevant information. In this paper, we introduce QA-Expand, a novel and effective framework for query expansion. It first generates multiple relevant questions from the initial query and subsequently produces corresponding pseudo-answers as surrogate documents. A feedback model further rewrites and filters these answers to ensure only the most informative augmentations are incorporated. Extensive experiments on benchmarks such as BEIR and TREC demonstrate that QA-Expand enhances retrieval performance by up to 13% over state-of-the-art methods, offering a robust solution for modern retrieval challenges.

* 8 pages

Via

Access Paper or Ask Questions

DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

Feb 09, 2025

Seunghyun Lee, I Made Aswin Nahrendra, Dongkyu Lee, Byeongho Yu, Minho Oh, Hyun Myung

Figure 1 for DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

Figure 2 for DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

Figure 3 for DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

Figure 4 for DreamFLEX: Learning Fault-Aware Quadrupedal Locomotion Controller for Anomaly Situation in Rough Terrains

Abstract:Recent advances in quadrupedal robots have demonstrated impressive agility and the ability to traverse diverse terrains. However, hardware issues, such as motor overheating or joint locking, may occur during long-distance walking or traversing through rough terrains leading to locomotion failures. Although several studies have proposed fault-tolerant control methods for quadrupedal robots, there are still challenges in traversing unstructured terrains. In this paper, we propose DreamFLEX, a robust fault-tolerant locomotion controller that enables a quadrupedal robot to traverse complex environments even under joint failure conditions. DreamFLEX integrates an explicit failure estimation and modulation network that jointly estimates the robot's joint fault vector and utilizes this information to adapt the locomotion pattern to faulty conditions in real-time, enabling quadrupedal robots to maintain stability and performance in rough terrains. Experimental results demonstrate that DreamFLEX outperforms existing methods in both simulation and real-world scenarios, effectively managing hardware failures while maintaining robust locomotion performance.

* Accepted for ICRA 2025. Project site is available at https://dreamflex.github.io/

Via

Access Paper or Ask Questions

Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Jan 02, 2025

Seunghyun Lee, Yuqi Gu

Figure 1 for Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Figure 2 for Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Figure 3 for Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Figure 4 for Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

Abstract:In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs are often overparametrized, non-identifiable, and uninterpretable black boxes, raising serious concerns when deploying them in high-stakes applications. Motivated by this, we propose an interpretable deep generative modeling framework for rich data types with discrete latent layers, called Deep Discrete Encoders (DDEs). A DDE is a directed graphical model with multiple binary latent layers. Theoretically, we propose transparent identifiability conditions for DDEs, which imply progressively smaller sizes of the latent layers as they go deeper. Identifiability ensures consistent parameter estimation and inspires an interpretable design of the deep architecture. Computationally, we propose a scalable estimation pipeline of a layerwise nonlinear spectral initialization followed by a penalized stochastic approximation EM algorithm. This procedure can efficiently estimate models with exponentially many latent components. Extensive simulation studies validate our theoretical results and demonstrate the proposed algorithms' excellent performance. We apply DDEs to three diverse real datasets for hierarchical topic modeling, image representation learning, response time modeling in educational testing, and obtain interpretable findings.

Via

Access Paper or Ask Questions

Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction

Nov 14, 2024

Youngjin Seol, Jaewoong Choi, Seunghyun Lee, Janghyeok Yoon

Figure 1 for Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction

Figure 2 for Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction

Figure 3 for Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction

Figure 4 for Dynamic technology impact analysis: A multi-task learning approach to patent citation prediction

Abstract:Machine learning (ML) models are valuable tools for analyzing the impact of technology using patent citation information. However, existing ML-based methods often struggle to account for the dynamic nature of the technology impact over time and the interdependencies of these impacts across different periods. This study proposes a multi-task learning (MTL) approach to enhance the prediction of technology impact across various time frames by leveraging knowledge sharing and simultaneously monitoring the evolution of technology impact. First, we quantify the technology impacts and identify patterns through citation analysis over distinct time periods. Next, we develop MTL models to predict citation counts using multiple patent indicators over time. Finally, we examine the changes in key input indicators and their patterns over different periods using the SHapley Additive exPlanation method. We also offer guidelines for validating and interpreting the results by employing statistical methods and natural language processing techniques. A case study on battery technologies demonstrates that our approach not only deepens the understanding of technology impact, but also improves prediction accuracy, yielding valuable insights for both academia and industry.

Via

Access Paper or Ask Questions

Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

Sep 29, 2024

I Made Aswin Nahrendra, Byeongho Yu, Minho Oh, Dongkyu Lee, Seunghyun Lee, Hyeonwoo Lee, Hyungtae Lim, Hyun Myung

Figure 1 for Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

Figure 2 for Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

Figure 3 for Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

Figure 4 for Obstacle-Aware Quadrupedal Locomotion With Resilient Multi-Modal Reinforcement Learning

Abstract:Quadrupedal robots hold promising potential for applications in navigating cluttered environments with resilience akin to their animal counterparts. However, their floating base configuration makes them vulnerable to real-world uncertainties, yielding substantial challenges in their locomotion control. Deep reinforcement learning has become one of the plausible alternatives for realizing a robust locomotion controller. However, the approaches that rely solely on proprioception sacrifice collision-free locomotion because they require front-feet contact to detect the presence of stairs to adapt the locomotion gait. Meanwhile, incorporating exteroception necessitates a precisely modeled map observed by exteroceptive sensors over a period of time. Therefore, this work proposes a novel method to fuse proprioception and exteroception featuring a resilient multi-modal reinforcement learning. The proposed method yields a controller that showcases agile locomotion performance on a quadrupedal robot over a myriad of real-world courses, including rough terrains, steep slopes, and high-rise stairs, while retaining its robustness against out-of-distribution situations.

* Under review. Project site is available at https://dreamwaqpp.github.io

Via

Access Paper or Ask Questions