Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seong-Woong Shim

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

May 19, 2025

Myunsoo Kim, Seong-Woong Shim, Byung-Jun Lee

Abstract:False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across two widely adopted VLP frameworks (ALBEF, BLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.

* The manuscript contains errors that require substantial revision

Via

Access Paper or Ask Questions

Prior-Guided Diffusion Planning for Offline Reinforcement Learning

May 16, 2025

Donghyeon Ki, JunHyeok Oh, Seong-Woong Shim, Byung-Jun Lee

Abstract:Diffusion models have recently gained prominence in offline reinforcement learning due to their ability to effectively learn high-performing, generalizable policies from static datasets. Diffusion-based planners facilitate long-horizon decision-making by generating high-quality trajectories through iterative denoising, guided by return-maximizing objectives. However, existing guided sampling strategies such as Classifier Guidance, Classifier-Free Guidance, and Monte Carlo Sample Selection either produce suboptimal multi-modal actions, struggle with distributional drift, or incur prohibitive inference-time costs. To address these challenges, we propose Prior Guidance (PG), a novel guided sampling framework that replaces the standard Gaussian prior of a behavior-cloned diffusion model with a learnable distribution, optimized via a behavior-regularized objective. PG directly generates high-value trajectories without costly reward optimization of the diffusion model itself, and eliminates the need to sample multiple candidates at inference for sample selection. We present an efficient training strategy that applies behavior regularization in latent space, and empirically demonstrate that PG outperforms state-of-the-art diffusion policies and planners across diverse long-horizon offline RL benchmarks.

Via

Access Paper or Ask Questions

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Nov 15, 2024

Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee

Figure 1 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 2 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 3 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 4 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Abstract:As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.

Via

Access Paper or Ask Questions

Offline Imitation Learning by Controlling the Effective Planning Horizon

Jan 18, 2024

Hee-Jun Ahn, Seong-Woong Shim, Byung-Jun Lee

Figure 1 for Offline Imitation Learning by Controlling the Effective Planning Horizon

Figure 2 for Offline Imitation Learning by Controlling the Effective Planning Horizon

Figure 3 for Offline Imitation Learning by Controlling the Effective Planning Horizon

Figure 4 for Offline Imitation Learning by Controlling the Effective Planning Horizon

Abstract:In offline imitation learning (IL), we generally assume only a handful of expert trajectories and a supplementary offline dataset from suboptimal behaviors to learn the expert policy. While it is now common to minimize the divergence between state-action visitation distributions so that the agent also considers the future consequences of an action, a sampling error in an offline dataset may lead to erroneous estimates of state-action visitations in the offline case. In this paper, we investigate the effect of controlling the effective planning horizon (i.e., reducing the discount factor) as opposed to imposing an explicit regularizer, as previously studied. Unfortunately, it turns out that the existing algorithms suffer from magnified approximation errors when the effective planning horizon is shortened, which results in a significant degradation in performance. We analyze the main cause of the problem and provide the right remedies to correct the algorithm. We show that the corrected algorithm improves on popular imitation learning benchmarks by controlling the effective planning horizon rather than an explicit regularization.

* Preprint

Via

Access Paper or Ask Questions