Picture for Renjie Pi

Renjie Pi

May

Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models

Add code
Sep 19, 2025
Figure 1 for Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Figure 2 for Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Figure 3 for Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Figure 4 for Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
Viaarxiv icon

Generalizable Geometric Image Caption Synthesis

Add code
Sep 18, 2025
Viaarxiv icon

ExeSQL: Self-Taught Text-to-SQL Models with Execution-Driven Bootstrapping for SQL Dialects

Add code
May 22, 2025
Viaarxiv icon

MR. Judge: Multimodal Reasoner as a Judge

Add code
May 19, 2025
Viaarxiv icon

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

Add code
Mar 05, 2025
Figure 1 for MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Figure 2 for MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Figure 3 for MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Figure 4 for MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Viaarxiv icon

VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues

Add code
Feb 17, 2025
Figure 1 for VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
Figure 2 for VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
Figure 3 for VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
Figure 4 for VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
Viaarxiv icon

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Add code
Feb 05, 2025
Viaarxiv icon

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Add code
Dec 18, 2024
Figure 1 for VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Figure 2 for VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Figure 3 for VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Figure 4 for VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Viaarxiv icon

SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation

Add code
Dec 13, 2024
Figure 1 for SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Figure 2 for SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Figure 3 for SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Figure 4 for SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Viaarxiv icon