Picture for Xiaohan Wang

Xiaohan Wang

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

Add code
Jun 03, 2023
Viaarxiv icon

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

Add code
May 29, 2023
Viaarxiv icon

Action Sensitivity Learning for Temporal Action Localization

Add code
May 25, 2023
Viaarxiv icon

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

Add code
May 23, 2023
Viaarxiv icon

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Add code
May 22, 2023
Viaarxiv icon

Gloss-Free End-to-End Sign Language Translation

Add code
May 22, 2023
Figure 1 for Gloss-Free End-to-End Sign Language Translation
Figure 2 for Gloss-Free End-to-End Sign Language Translation
Figure 3 for Gloss-Free End-to-End Sign Language Translation
Figure 4 for Gloss-Free End-to-End Sign Language Translation
Viaarxiv icon

Continual Multimodal Knowledge Graph Construction

Add code
May 15, 2023
Viaarxiv icon

How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?

Add code
May 03, 2023
Viaarxiv icon

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Add code
Mar 26, 2023
Viaarxiv icon

Lana: A Language-Capable Navigator for Instruction Following and Generation

Add code
Mar 15, 2023
Figure 1 for Lana: A Language-Capable Navigator for Instruction Following and Generation
Figure 2 for Lana: A Language-Capable Navigator for Instruction Following and Generation
Figure 3 for Lana: A Language-Capable Navigator for Instruction Following and Generation
Figure 4 for Lana: A Language-Capable Navigator for Instruction Following and Generation
Viaarxiv icon