Picture for Qing Jiang

Qing Jiang

ScaleHP: Estimating Hand Pose in Metric Space

Add code
Jun 24, 2026
Viaarxiv icon

SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding

Add code
May 14, 2026
Viaarxiv icon

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models

Add code
May 13, 2026
Viaarxiv icon

V-Reflection: Transforming MLLMs from Passive Observers to Active Interrogators

Add code
Mar 31, 2026
Viaarxiv icon

SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

Add code
Feb 26, 2026
Viaarxiv icon

T-Rex-Omni: Integrating Negative Visual Prompt in Generic Object Detection

Add code
Nov 12, 2025
Viaarxiv icon

Detect Anything via Next Point Prediction

Add code
Oct 14, 2025
Viaarxiv icon

Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning

Add code
Jun 04, 2025
Viaarxiv icon

Referring to Any Person

Add code
Mar 11, 2025
Figure 1 for Referring to Any Person
Figure 2 for Referring to Any Person
Figure 3 for Referring to Any Person
Figure 4 for Referring to Any Person
Viaarxiv icon

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Add code
Dec 02, 2024
Figure 1 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 2 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 3 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Figure 4 for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Viaarxiv icon