Picture for Jinqiao Wang

Jinqiao Wang

Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, objecteye.Inc

ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning

Add code
Dec 21, 2025
Viaarxiv icon

UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations

Add code
Dec 12, 2025
Viaarxiv icon

PixCLIP: Achieving Fine-grained Visual Language Understanding via Any-granularity Pixel-Text Alignment Learning

Add code
Nov 06, 2025
Viaarxiv icon

From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation

Add code
Oct 01, 2025
Figure 1 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 2 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 3 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Figure 4 for From Seeing to Predicting: A Vision-Language Framework for Trajectory Forecasting and Controlled Video Generation
Viaarxiv icon

AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection

Add code
Aug 08, 2025
Viaarxiv icon

UniFGVC: Universal Training-Free Few-Shot Fine-Grained Vision Classification via Attribute-Aware Multimodal Retrieval

Add code
Aug 06, 2025
Viaarxiv icon

Scaling Linear Attention with Sparse State Expansion

Add code
Jul 22, 2025
Viaarxiv icon

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing

Add code
Jul 02, 2025
Viaarxiv icon

VFaith: Do Large Multimodal Models Really Reason on Seen Images Rather than Previous Memories?

Add code
Jun 13, 2025
Viaarxiv icon

Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models

Add code
May 27, 2025
Viaarxiv icon