Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Myunsoo Kim

Rethinking DPO: The Role of Rejected Responses in Preference Misalignment

Jun 15, 2025

Jay Hyeon Cho, JunHyeok Oh, Myunsoo Kim, Byung-Jun Lee

Abstract:Direct Preference Optimization (DPO) is a simple and efficient framework that has attracted substantial attention. However, it often struggles to meet its primary objectives -- increasing the generation probability of chosen responses while reducing that of rejected responses -- due to the dominant influence of rejected responses on the loss function. This imbalance leads to suboptimal performance in promoting preferred responses. In this work, we systematically analyze the limitations of DPO and existing algorithms designed to achieve the objectives stated above. To address these limitations, we propose Bounded-DPO (BDPO), a novel method that bounds the influence of rejected responses while maintaining the original optimization structure of DPO. Through theoretical analysis and empirical evaluations, we demonstrate that BDPO achieves a balanced optimization of the chosen and rejected responses, outperforming existing algorithms.

Via

Access Paper or Ask Questions

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

May 19, 2025

Myunsoo Kim, Seong-Woong Shim, Byung-Jun Lee

Abstract:False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across two widely adopted VLP frameworks (ALBEF, BLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.

* The manuscript contains errors that require substantial revision

Via

Access Paper or Ask Questions

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Nov 15, 2024

Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee

Figure 1 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 2 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 3 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Figure 4 for Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Abstract:As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.

Via

Access Paper or Ask Questions