Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fatemeh Pesaran Zadeh

LPOI: Listwise Preference Optimization for Vision Language Models

May 27, 2025

Fatemeh Pesaran Zadeh, Yoojin Oh, Gunhee Kim

Figure 1 for LPOI: Listwise Preference Optimization for Vision Language Models

Figure 2 for LPOI: Listwise Preference Optimization for Vision Language Models

Figure 3 for LPOI: Listwise Preference Optimization for Vision Language Models

Figure 4 for LPOI: Listwise Preference Optimization for Vision Language Models

Abstract:Aligning large VLMs with human preferences is a challenging task, as methods like RLHF and DPO often overfit to textual information or exacerbate hallucinations. Although augmenting negative image samples partially addresses these pitfalls, no prior work has employed listwise preference optimization for VLMs, due to the complexity and cost of constructing listwise image samples. In this work, we propose LPOI, the first object-aware listwise preference optimization developed for reducing hallucinations in VLMs. LPOI identifies and masks a critical object in the image, and then interpolates the masked region between the positive and negative images to form a sequence of incrementally more complete images. The model is trained to rank these images in ascending order of object visibility, effectively reducing hallucinations while retaining visual fidelity. LPOI requires no extra annotations beyond standard pairwise preference data, as it automatically constructs the ranked lists through object masking and interpolation. Comprehensive experiments on MMHalBench, AMBER, and Object HalBench confirm that LPOI outperforms existing preference optimization methods in reducing hallucinations and enhancing VLM performance. We make the code available at https://github.com/fatemehpesaran310/lpoi.

* ACL 2025 Main. Code is released at https://github.com/fatemehpesaran310/lpoi

Via

Access Paper or Ask Questions

Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Oct 05, 2024

Fatemeh Pesaran Zadeh, Juyeon Kim, Jin-Hwa Kim, Gunhee Kim

Figure 1 for Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Figure 2 for Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Figure 3 for Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Figure 4 for Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback

Abstract:Large language models (LLMs) have demonstrated strong capabilities across various language tasks, notably through instruction-tuning methods. However, LLMs face challenges in visualizing complex, real-world data through charts and plots. Firstly, existing datasets rarely cover a full range of chart types, such as 3D, volumetric, and gridded charts. Secondly, supervised fine-tuning methods do not fully leverage the intricate relationships within rich datasets, including text, code, and figures. To address these challenges, we propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library, with 11.1K tuples of descriptions, code, data tables, and plots. Moreover, we introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback. Our experiments show that this approach significantly enhances the model performance, enabling smaller models to outperform larger open-source models and be comparable to state-of-the-art proprietary models in data visualization tasks. We make the code and dataset available at https://github.com/fatemehpesaran310/Text2Chart31.

* EMNLP 2024 Main. Code and dataset are released at https://github.com/fatemehpesaran310/Text2Chart31

Via

Access Paper or Ask Questions