Picture for Quanjun Yin

Quanjun Yin

Understanding Model Merging: A Unified Generalization Framework for Heterogeneous Experts

Add code
Jan 29, 2026
Viaarxiv icon

CityCube: Benchmarking Cross-view Spatial Reasoning on Vision-Language Models in Urban Environments

Add code
Jan 20, 2026
Viaarxiv icon

Unveiling the Power of Multiple Gossip Steps: A Stability-Based Generalization Analysis in Decentralized Training

Add code
Oct 09, 2025
Viaarxiv icon

Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution

Add code
Sep 16, 2025
Figure 1 for Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution
Figure 2 for Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution
Figure 3 for Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution
Figure 4 for Learn to Relax with Large Language Models: Solving Nonlinear Combinatorial Optimization Problems via Bidirectional Coevolution
Viaarxiv icon

Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events

Add code
May 22, 2025
Viaarxiv icon

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Add code
May 14, 2025
Viaarxiv icon

GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation

Add code
Apr 13, 2025
Viaarxiv icon

SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding

Add code
Feb 24, 2025
Figure 1 for SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Figure 2 for SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Figure 3 for SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Figure 4 for SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Viaarxiv icon

Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model

Add code
Nov 16, 2024
Figure 1 for Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
Figure 2 for Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
Figure 3 for Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
Figure 4 for Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model
Viaarxiv icon

OledFL: Unleashing the Potential of Decentralized Federated Learning via Opposite Lookahead Enhancement

Add code
Oct 09, 2024
Viaarxiv icon