Picture for Weitai Kang

Weitai Kang

From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields

Add code
Dec 13, 2025
Viaarxiv icon

VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction

Add code
Dec 11, 2025
Figure 1 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 2 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 3 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Figure 4 for VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction
Viaarxiv icon

Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

Add code
Aug 11, 2025
Figure 1 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 2 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 3 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Figure 4 for Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Viaarxiv icon

InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction

Add code
May 16, 2025
Figure 1 for InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Figure 2 for InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Figure 3 for InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Figure 4 for InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction
Viaarxiv icon

3DResT: A Strong Baseline for Semi-Supervised 3D Referring Expression Segmentation

Add code
Apr 17, 2025
Viaarxiv icon

Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage

Add code
Nov 02, 2024
Figure 1 for Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Figure 2 for Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Figure 3 for Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Figure 4 for Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage
Viaarxiv icon

Visual Grounding with Attention-Driven Constraint Balancing

Add code
Jul 03, 2024
Viaarxiv icon

ACTRESS: Active Retraining for Semi-supervised Visual Grounding

Add code
Jul 03, 2024
Viaarxiv icon

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

Add code
Jul 03, 2024
Figure 1 for SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Figure 2 for SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Figure 3 for SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Figure 4 for SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Viaarxiv icon

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Add code
May 28, 2024
Figure 1 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Figure 2 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Figure 3 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Figure 4 for Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Viaarxiv icon