Picture for Hang Xu

Hang Xu

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Add code
Jun 02, 2024
Viaarxiv icon

Correctable Landmark Discovery via Large Models for Vision-Language Navigation

Add code
May 29, 2024
Figure 1 for Correctable Landmark Discovery via Large Models for Vision-Language Navigation
Figure 2 for Correctable Landmark Discovery via Large Models for Vision-Language Navigation
Figure 3 for Correctable Landmark Discovery via Large Models for Vision-Language Navigation
Figure 4 for Correctable Landmark Discovery via Large Models for Vision-Language Navigation
Viaarxiv icon

LaneCorrect: Self-supervised Lane Detection

Add code
Apr 23, 2024
Figure 1 for LaneCorrect: Self-supervised Lane Detection
Figure 2 for LaneCorrect: Self-supervised Lane Detection
Figure 3 for LaneCorrect: Self-supervised Lane Detection
Figure 4 for LaneCorrect: Self-supervised Lane Detection
Viaarxiv icon

Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent

Add code
Apr 22, 2024
Viaarxiv icon

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Add code
Apr 14, 2024
Figure 1 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 2 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 3 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 4 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Viaarxiv icon

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution

Add code
Mar 25, 2024
Figure 1 for Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Figure 2 for Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Figure 3 for Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Figure 4 for Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Viaarxiv icon

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Add code
Mar 22, 2024
Figure 1 for Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Figure 2 for Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Figure 3 for Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Figure 4 for Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Viaarxiv icon

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

Add code
Mar 18, 2024
Viaarxiv icon

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

Add code
Mar 18, 2024
Viaarxiv icon

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

Add code
Mar 12, 2024
Figure 1 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 2 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 3 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 4 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Viaarxiv icon