Glips


LAB-Det: Language as a Domain-Invariant Bridge for Training-Free One-Shot Domain Generalization in Object Detection

Add code
Feb 06, 2026
Viaarxiv icon

Exact Graph Learning via Integer Programming

Add code
Jan 28, 2026
Viaarxiv icon

Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance

Add code
Jan 12, 2026
Viaarxiv icon

Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection

Add code
Nov 07, 2025
Viaarxiv icon

GLip: A Global-Local Integrated Progressive Framework for Robust Visual Speech Recognition

Add code
Sep 19, 2025
Viaarxiv icon

Generating Vision-Language Navigation Instructions Incorporated Fine-Grained Alignment Annotations

Add code
Jun 10, 2025
Viaarxiv icon

Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures

Add code
May 16, 2025
Figure 1 for Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Figure 2 for Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Figure 3 for Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Figure 4 for Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Viaarxiv icon

GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model

Add code
Apr 29, 2025
Figure 1 for GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model
Figure 2 for GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model
Figure 3 for GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model
Figure 4 for GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model
Viaarxiv icon

Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection

Add code
Feb 22, 2025
Figure 1 for Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Figure 2 for Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Figure 3 for Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Figure 4 for Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Viaarxiv icon

NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation

Add code
Nov 13, 2024
Viaarxiv icon