Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

HyeongYeop Kang

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

May 20, 2026

SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang

Abstract:Adaptive densification is the engine of 3D Gaussian Splatting (3DGS). However, when transposed to the optimization-based Generative Distillation paradigm, this reconstruction-native mechanism reveals fundamental limitations, resulting in inefficient representations cluttered with redundant primitives. We diagnose this failure as a Densification Dilemma stemming from the stochastic nature of generative guidance: the standard magnitude-based accumulation indiscriminately aggregates transient noise alongside geometric signals, making it difficult to strike a balance between over-densification and under-fitting. To resolve this, we introduce Context-Adaptive Moment Estimation (CAdam), a novel framework that reinterprets densification as a statistically grounded signal verification problem. CAdam leverages the first moment of gradients to exploit the interference principle, where stochastic fluctuations cancel out via destructive interference while consistent geometric drifts accumulate via constructive interference, effectively disentangling the underlying signal from the generative noise floor. This is further augmented by a quantile-based context awareness and an intrinsic Signal-to-Noise Ratio (SNR) gating mechanism, which ensure robust adaptation across optimization stages and enable the soft termination of densification. Extensive experiments across diverse objectives (SDS, ISM, VFDS) and strong generative 3DGS backbones show that CAdam reduces Gaussian count by 85%-97% relative to standard densification while preserving overall comparable perceptual quality. These results highlight signal-aware density control as a practical way to improve memory efficiency in optimization-based generative distillation.

* Accepted to SIGGRAPH 2026 Conference Papers. 12 pages, 8 figures

Via

Access Paper or Ask Questions

ORACLE: Orchestrate NPC Daily Activities using Contrastive Learning with Transformer-CVAE

Mar 25, 2026

Seong-Eun Hong, JuYeong Hwang, RyunHa Lee, HyeongYeop Kang

Abstract:The integration of Non-player characters (NPCs) within digital environments has been increasingly recognized for its potential to augment user immersion and cognitive engagement. The sophisticated orchestration of their daily activities, reflecting the nuances of human daily routines, contributes significantly to the realism of digital environments. Nevertheless, conventional approaches often produce monotonous repetition, falling short of capturing the intricacies of real human activity plans. In response to this, we introduce ORACLE, a novel generative model for the synthesis of realistic indoor daily activity plans, ensuring NPCs' authentic presence in digital habitats. Exploiting the CASAS smart home dataset's 24-hour indoor activity sequences, ORACLE addresses challenges in the dataset, including its imbalanced sequential data, the scarcity of training samples, and the absence of pre-trained models encapsulating human daily activity patterns. ORACLE's training leverages the sequential data processing prowess of Transformers, the generative controllability of Conditional Variational Autoencoders (CVAE), and the discriminative refinement of contrastive learning. Our experimental results validate the superiority of generating NPC activity plans and the efficacy of our design strategies over existing methods.

* 17 pages, 7 figures. Accepted to CVM 2026

Via

Access Paper or Ask Questions

Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing

Mar 18, 2026

Seongrae Noh, SeungWon Seo, Gyeong-Moon Park, HyeongYeop Kang

Abstract:Editing a 3D indoor scene from natural language is conceptually straightforward but technically challenging. Existing open-vocabulary systems often regenerate large portions of a scene or rely on image-space edits that disrupt spatial structure, resulting in unintended global changes or physically inconsistent layouts. These limitations stem from treating editing primarily as a generative task. We take a different view. A user instruction defines a desired world state, and editing should be the minimal sequence of actions that makes this state true while preserving everything else. This perspective motivates Edit-As-Act, a framework that performs open-vocabulary scene editing as goal-regressive planning in 3D space. Given a source scene and free-form instruction, Edit-As-Act predicts symbolic goal predicates and plans in EditLang, a PDDL-inspired action language that we design with explicit preconditions and effects encoding support, contact, collision, and other geometric relations. A language-driven planner proposes actions, and a validator enforces goal-directedness, monotonicity, and physical feasibility, producing interpretable and physically coherent transformations. By separating reasoning from low-level generation, Edit-As-Act achieves instruction fidelity, semantic consistency, and physical plausibility - three criteria that existing paradigms cannot satisfy together. On E2A-Bench, our benchmark of 63 editing tasks across 9 indoor environments, Edit-As-Act significantly outperforms prior approaches across all edit types and scene categories.

* Accepted to CVPR 2026

Via

Access Paper or Ask Questions

RenderMem: Rendering as Spatial Memory Retrieval

Mar 15, 2026

JooHyun Park, HyeongYeop Kang

Abstract:Embodied reasoning is inherently viewpoint-dependent: what is visible, occluded, or reachable depends critically on where the agent stands. However, existing spatial memory systems for embodied agents typically store either multi-view observations or object-centric abstractions, making it difficult to perform reasoning with explicit geometric grounding. We introduce RenderMem, a spatial memory framework that treats rendering as the interface between 3D world representations and spatial reasoning. Instead of storing fixed observations, RenderMem maintains a 3D scene representation and generates query-conditioned visual evidence by rendering the scene from viewpoints implied by the query. This enables embodied agents to reason directly about line-of-sight, visibility, and occlusion from arbitrary perspectives. RenderMem is fully compatible with existing vision-language models and requires no modification to standard architectures. Experiments in the AI2-THOR environment show consistent improvements on viewpoint-dependent visibility and occlusion queries over prior memory baselines.

Via

Access Paper or Ask Questions

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

Feb 04, 2026

SeungWon Seo, SooBin Lim, SeongRae Noh, Haneul Kim, HyeongYeop Kang

Abstract:Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators' intentions. Recent advances in applying Large Language Models (LLMs) to embodied agents have addressed many long-standing challenges, such as high-level goal decomposition and online adaptation. Yet, uncertainty is still primarily mitigated through frequent inter-agent communication. This incurs substantial token and time costs, and can disrupt established workflows, when human partners are involved. We introduce PCE, a Planner-Composer-Evaluator framework that converts the fragmented assumptions latent in LLM reasoning traces into a structured decision tree. Internal nodes encode environment assumptions and leaves map to actions; each path is then scored by scenario likelihood, goal-directed gain, and execution cost to guide rational action selection without heavy communication. Across two challenging multi-agent benchmarks (C-WAH and TDW-MAT) and three diverse LLM backbones, PCE consistently outperforms communication-centric baselines in success rate and task efficiency while showing comparable token usage. Ablation results indicate that the performance gains obtained by scaling model capacity or reasoning depth persist even when PCE is applied, while PCE consistently raises the baseline across both capacity and reasoning-depth scales, confirming that structured uncertainty handling complements both forms of scaling. A user study further demonstrates that PCE produces communication patterns that human partners perceive as more efficient and trustworthy. Together, these results establish a principled route for turning latent LLM assumptions into reliable strategies for uncertainty-aware planning.

* 31 pages, 10 figures, Accepted ICLR 2026

Via

Access Paper or Ask Questions

ForceGrip: Data-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Mar 11, 2025

DongHeun Han, Byungmin Kim, RoUn Lee, KyeongMin Kim, Hyoseok Hwang, HyeongYeop Kang

Figure 1 for ForceGrip: Data-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Figure 2 for ForceGrip: Data-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Figure 3 for ForceGrip: Data-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Figure 4 for ForceGrip: Data-Free Curriculum Learning for Realistic Grip Force Control in VR Hand Manipulation

Abstract:Realistic hand manipulation is a key component of immersive virtual reality (VR), yet existing methods often rely on a kinematic approach or motion-capture datasets that omit crucial physical attributes such as contact forces and finger torques. Consequently, these approaches prioritize tight, one-size-fits-all grips rather than reflecting users' intended force levels. We present ForceGrip, a deep learning agent that synthesizes realistic hand manipulation motions, faithfully reflecting the user's grip force intention. Instead of mimicking predefined motion datasets, ForceGrip uses generated training scenarios-randomizing object shapes, wrist movements, and trigger input flows-to challenge the agent with a broad spectrum of physical interactions. To effectively learn from these complex tasks, we employ a three-phase curriculum learning framework comprising Finger Positioning, Intention Adaptation, and Dynamic Stabilization. This progressive strategy ensures stable hand-object contact, adaptive force control based on user inputs, and robust handling under dynamic conditions. Additionally, a proximity reward function enhances natural finger motions and accelerates training convergence. Quantitative and qualitative evaluations reveal ForceGrip's superior force controllability and plausibility compared to state-of-the-art methods.

* 19 pages, 10 figs (with appendix)

Via

Access Paper or Ask Questions

LLM-Based Cooperative Agents using Information Relevance and Plan Validation

May 27, 2024

SeungWon Seo, Junhyeok Lee, SeongRae Noh, HyeongYeop Kang

Figure 1 for LLM-Based Cooperative Agents using Information Relevance and Plan Validation

Figure 2 for LLM-Based Cooperative Agents using Information Relevance and Plan Validation

Figure 3 for LLM-Based Cooperative Agents using Information Relevance and Plan Validation

Figure 4 for LLM-Based Cooperative Agents using Information Relevance and Plan Validation

Abstract:We address the challenge of multi-agent cooperation, where agents achieve a common goal by interacting with a 3D scene and cooperating with decentralized agents under complex partial observations. This involves managing communication costs and optimizing interaction trajectories in dynamic environments. Our research focuses on three primary limitations of existing cooperative agent systems. Firstly, current systems demonstrate inefficiency in managing acquired information through observation, resulting in declining planning performance as the environment becomes more complex with additional objects or goals. Secondly, the neglect of false plans in partially observable settings leads to suboptimal cooperative performance, as agents struggle to adapt to environmental changes influenced by the unseen actions of other agents. Lastly, the failure to incorporate spatial data into decision-making processes restricts the agent's ability to construct optimized trajectories. To overcome these limitations, we propose the RElevance and Validation-Enhanced Cooperative Language Agent (REVECA), a novel cognitive architecture powered by GPT-3.5. REVECA leverages relevance assessment, plan validation, and spatial information to enhance the efficiency and robustness of agent cooperation in dynamic and partially observable environments while minimizing continuous communication costs and effectively managing irrelevant dummy objects. Our extensive experiments demonstrate the superiority of REVECA over previous approaches, including those driven by GPT-4.0. Additionally, a user study highlights REVECA's potential for achieving trustworthy human-AI cooperation. We expect that REVECA will have significant applications in gaming, XR applications, educational tools, and humanoid robots, contributing to substantial economic, commercial, and academic advancements.

Via

Access Paper or Ask Questions

3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Apr 03, 2024

SeungJeh Chung, JooHyun Park, Hyewon Kan, HyeongYeop Kang

Figure 1 for 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Figure 2 for 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Figure 3 for 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Figure 4 for 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization

Abstract:3D stylization, which entails the application of specific styles to three-dimensional objects, holds significant commercial potential as it enables the creation of diverse 3D objects with distinct moods and styles, tailored to specific demands of different scenes. With recent advancements in text-driven methods and artificial intelligence, the stylization process is increasingly intuitive and automated, thereby diminishing the reliance on manual labor and expertise. However, existing methods have predominantly focused on holistic stylization, thereby leaving the application of styles to individual components of a 3D object unexplored. In response, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, part-tailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP leverages the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize the individual parts of the 3D mesh and modify their colors and local geometries to align them with the desired styles specified in the text prompt. 3DStyleGLIP is effectively trained for 3D stylization tasks through a part-level style loss working in GLIP's embedding space, supplemented by two complementary learning techniques. Extensive experimental validation confirms that our method achieves significant part-wise stylization capabilities, demonstrating promising potential in advancing the field of 3D stylization.

Via

Access Paper or Ask Questions