Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sung-Eui Yoon

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

Mar 31, 2026

Andrew Jeong, Jaemin Kim, Sebin Lee, Sung-Eui Yoon

Abstract:Robotic manipulation involves kinematic and semantic transitions that are inherently coupled via underlying actions. However, existing approaches plan within either semantic or latent space without explicitly aligning these cross-modal transitions. To address this, we propose CLaD, a framework that models how proprioceptive and semantic states jointly evolve under actions through asymmetric cross-attention that allows kinematic transitions to query semantic ones. CLaD predicts grounded latent foresights via self-supervised objectives with EMA target encoders and auxiliary reconstruction losses, preventing representation collapse while anchoring predictions to observable states. Predicted foresights are modulated with observations to condition a diffusion policy for action generation. On LIBERO-LONG benchmark, CLaD achieves 94.7\% success rate, competitive with large VLAs with significantly fewer parameters.

* Project page: https://andrewwwj.github.io/clad

Via

Access Paper or Ask Questions

GLINT: Modeling Scene-Scale Transparency via Gaussian Radiance Transport

Mar 27, 2026

Youngju Na, Jaeseong Yun, Soohyun Ryu, Hyunsu Kim, Sung-Eui Yoon, Suyong Yeon

Abstract:While 3D Gaussian splatting has emerged as a powerful paradigm, it fundamentally fails to model transparency such as glass panels. The core challenge lies in decoupling the intertwined radiance contributions from transparent interfaces and the transmitted geometry observed through the glass. We present GLINT, a framework that models scene-scale transparency through explicit decomposed Gaussian representation. GLINT reconstructs the primary interface and models reflected and transmitted radiance separately, enabling consistent radiance transport. During optimization, GLINT bootstraps transparency localization from geometry-separation cues induced by the decomposition, together with geometry and material priors from a pre-trained video relighting model. Extensive experiments demonstrate consistent improvements over prior methods for reconstructing complex transparent scenes.

* CVPR 2026, Project page: https://youngju-na.github.io/GLINT

Via

Access Paper or Ask Questions

VERIA: Verification-Centric Multimodal Instance Augmentation for Long-Tailed 3D Object Detection

Mar 25, 2026

Jumin Lee, Siyeong Lee, Namil Kim, Sung-Eui Yoon

Abstract:Long-tail distributions in driving datasets pose a fundamental challenge for 3D perception, as rare classes exhibit substantial intra-class diversity yet available samples cover this variation space only sparsely. Existing instance augmentation methods based on copy-paste or asset libraries improve rare-class exposure but are often limited in fine-grained diversity and scene-context placement. We propose VERIA, an image-first multimodal augmentation framework that synthesizes synchronized RGB--LiDAR instances using off-the-shelf foundation models and curates them with sequential semantic and geometric verification. This verification-centric design tends to select instances that better match real LiDAR statistics while spanning a wider range of intra-class variation. Stage-wise yield decomposition provides a log-based diagnostic of pipeline reliability. On nuScenes and Lyft, VERIA improves rare-class 3D object detection in both LiDAR-only and multimodal settings. Our code is available at https://sgvr.kaist.ac.kr/VERIA/.

Via

Access Paper or Ask Questions

DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Mar 22, 2026

Xiangchen Liu, Hanghan Zheng, Jeil Jeong, Minsung Yoon, Lin Zhao, Zhide Zhong, Haoang Li, Sung-Eui Yoon

Abstract:Vision-language Navigation (VLN) requires an agent to understand visual observations and language instructions to navigate in unseen environments. Most existing approaches rely on static scene assumptions and struggle to generalize in dynamic, real-world scenarios. To address this challenge, we propose DyGeoVLN, a dynamic geometry-aware VLN framework. Our method infuses a dynamic geometry foundation model into the VLN framework through cross-branch feature fusion to enable explicit 3D spatial representation and visual-semantic reasoning. To efficiently compress historical token information in long-horizon, dynamic navigation, we further introduce a novel pose-free and adaptive-resolution token-pruning strategy. This strategy can remove spatio-temporal redundant tokens to reduce inference cost. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on multiple benchmarks and exhibits strong robustness in real-world environments.

Via

Access Paper or Ask Questions

No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings

Feb 26, 2026

Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-Eui Yoon

Abstract:Latent diffusion models have achieved remarkable success in high-fidelity text-to-image generation, but their tendency to memorize training data raises critical privacy and intellectual property concerns. Membership inference attacks (MIAs) provide a principled way to audit such memorization by determining whether a given sample was included in training. However, existing approaches assume access to ground-truth captions. This assumption fails in realistic scenarios where only images are available and their textual annotations remain undisclosed, rendering prior methods ineffective when substituted with vision-language model (VLM) captions. In this work, we propose MoFit, a caption-free MIA framework that constructs synthetic conditioning inputs that are explicitly overfitted to the target model's generative manifold. Given a query image, MoFit proceeds in two stages: (i) model-fitted surrogate optimization, where a perturbation applied to the image is optimized to construct a surrogate in regions of the model's unconditional prior learned from member samples, and (ii) surrogate-driven embedding extraction, where a model-fitted embedding is derived from the surrogate and then used as a mismatched condition for the query image. This embedding amplifies conditional loss responses for member samples while leaving hold-outs relatively less affected, thereby enhancing separability in the absence of ground-truth captions. Our comprehensive experiments across multiple datasets and diffusion models demonstrate that MoFit consistently outperforms prior VLM-conditioned baselines and achieves performance competitive with caption-dependent methods.

* Accepted to ICLR 2026

Via

Access Paper or Ask Questions

Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation

Feb 10, 2026

Minsung Yoon, Jeil Jeong, Sung-Eui Yoon

Abstract:Skateboards offer a compact and efficient means of transportation as a type of personal mobility device. However, controlling them with legged robots poses several challenges for policy learning due to perception-driven interactions and multi-modal control objectives across distinct skateboarding phases. To address these challenges, we introduce Phase-Aware Policy Learning (PAPL), a reinforcement-learning framework tailored for skateboarding with quadruped robots. PAPL leverages the cyclic nature of skateboarding by integrating phase-conditioned Feature-wise Linear Modulation layers into actor and critic networks, enabling a unified policy that captures phase-dependent behaviors while sharing robot-specific knowledge across phases. Our evaluations in simulation validate command-tracking accuracy and conduct ablation studies quantifying each component's contribution. We also compare locomotion efficiency against leg and wheel-leg baselines and show real-world transferability.

* Accepted at ICRA 2026. Supplementary Video: https://www.youtube.com/watch?v=bCNfdQ3RYKg. M. Yoon and J. Jeong contributed equally

Via

Access Paper or Ask Questions

Learning-based Initialization of Trajectory Optimization for Path-following Problems of Redundant Manipulators

Feb 03, 2026

Minsung Yoon, Mincheul Kang, Daehyung Park, Sung-Eui Yoon

Abstract:Trajectory optimization (TO) is an efficient tool to generate a redundant manipulator's joint trajectory following a 6-dimensional Cartesian path. The optimization performance largely depends on the quality of initial trajectories. However, the selection of a high-quality initial trajectory is non-trivial and requires a considerable time budget due to the extremely large space of the solution trajectories and the lack of prior knowledge about task constraints in configuration space. To alleviate the issue, we present a learning-based initial trajectory generation method that generates high-quality initial trajectories in a short time budget by adopting example-guided reinforcement learning. In addition, we suggest a null-space projected imitation reward to consider null-space constraints by efficiently learning kinematically feasible motion captured in expert demonstrations. Our statistical evaluation in simulation shows the improved optimality, efficiency, and applicability of TO when we plug in our method's output, compared with three other baselines. We also show the performance improvement and feasibility via real-world experiments with a seven-degree-of-freedom manipulator.

* Accepted to ICRA 2023. <a href="https://sgvr.kaist.ac.kr/~msyoon/papers/ICRA23_RLITG/" rel="external noopener nofollow" class="link-external link-https">Project Page</a>

Via

Access Paper or Ask Questions

Enhancing Navigation Efficiency of Quadruped Robots via Leveraging Personal Transportation Platforms

Feb 03, 2026

Minsung Yoon, Sung-Eui Yoon

Abstract:Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (\textit{RL-ATR}), inspired by humans' utilization of personal transporters, including Segways. The \textit{RL-ATR} features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the \textit{RL-ATR}. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.

* Accepted to ICRA 2025. <a href="https://sgvr.kaist.ac.kr/~msyoon/papers/ICRA25/" rel="external noopener nofollow" class="link-external link-https">Project Page</a>

Via

Access Paper or Ask Questions

Learning-based Adaptive Control of Quadruped Robots for Active Stabilization on Moving Platforms

Feb 03, 2026

Minsung Yoon, Heechan Shin, Jeil Jeong, Sung-Eui Yoon

Abstract:A quadruped robot faces balancing challenges on a six-degrees-of-freedom moving platform, like subways, buses, airplanes, and yachts, due to independent platform motions and resultant diverse inertia forces on the robot. To alleviate these challenges, we present the Learning-based Active Stabilization on Moving Platforms (\textit{LAS-MP}), featuring a self-balancing policy and system state estimators. The policy adaptively adjusts the robot's posture in response to the platform's motion. The estimators infer robot and platform states based on proprioceptive sensor data. For a systematic training scheme across various platform motions, we introduce platform trajectory generation and scheduling methods. Our evaluation demonstrates superior balancing performance across multiple metrics compared to three baselines. Furthermore, we conduct a detailed analysis of the \textit{LAS-MP}, including ablation studies and evaluation of the estimators, to validate the effectiveness of each component.

* Accepted to IROS 2024. <a href="https://sgvr.kaist.ac.kr/~msyoon/papers/IROS24/" rel="external noopener nofollow" class="link-external link-https">Project Page</a>

Via

Access Paper or Ask Questions

Uncertainty-Aware Non-Prehensile Manipulation with Mobile Manipulators under Object-Induced Occlusion

Feb 02, 2026

Jiwoo Hwang, Taegeun Yang, Jeil Jeong, Minsung Yoon, Sung-Eui Yoon

Abstract:Non-prehensile manipulation using onboard sensing presents a fundamental challenge: the manipulated object occludes the sensor's field of view, creating occluded regions that can lead to collisions. We propose CURA-PPO, a reinforcement learning framework that addresses this challenge by explicitly modeling uncertainty under partial observability. By predicting collision possibility as a distribution, we extract both risk and uncertainty to guide the robot's actions. The uncertainty term encourages active perception, enabling simultaneous manipulation and information gathering to resolve occlusions. When combined with confidence maps that capture observation reliability, our approach enables safe navigation despite severe sensor occlusion. Extensive experiments across varying object sizes and obstacle configurations demonstrate that CURA-PPO achieves up to 3X higher success rates than the baselines, with learned behaviors that handle occlusions. Our method provides a practical solution for autonomous manipulation in cluttered environments using only onboard sensing.

* 8 pages, 7 figures, Accepted to ICRA 2026, Webpage: https://jiw0o.github.io/cura-ppo/

Via

Access Paper or Ask Questions