Picture for Jinman Zhao

Jinman Zhao

University of Toronto

Reinforcing Consistency in Video MLLMs with Structured Rewards

Add code
Apr 01, 2026
Viaarxiv icon

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

Add code
Feb 23, 2026
Viaarxiv icon

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models

Add code
Feb 01, 2026
Viaarxiv icon

Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding

Add code
Jan 30, 2026
Viaarxiv icon

$λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences

Add code
Oct 08, 2025
Figure 1 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 2 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 3 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Figure 4 for $λ$-GRPO: Unifying the GRPO Frameworks with Learnable Token Preferences
Viaarxiv icon

Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding

Add code
Sep 08, 2025
Viaarxiv icon

Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks

Add code
Jul 23, 2025
Viaarxiv icon

Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment

Add code
Jun 24, 2025
Viaarxiv icon

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy

Add code
May 28, 2025
Viaarxiv icon

UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models

Add code
May 26, 2025
Viaarxiv icon