Picture for Zhiwei Jia

Zhiwei Jia

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Add code
May 28, 2023
Figure 1 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 2 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 3 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Figure 4 for KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Viaarxiv icon

Chain-of-Thought Predictive Control

Add code
Apr 03, 2023
Figure 1 for Chain-of-Thought Predictive Control
Figure 2 for Chain-of-Thought Predictive Control
Figure 3 for Chain-of-Thought Predictive Control
Figure 4 for Chain-of-Thought Predictive Control
Viaarxiv icon

MetaCLUE: Towards Comprehensive Visual Metaphors Research

Add code
Dec 19, 2022
Figure 1 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 2 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 3 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Figure 4 for MetaCLUE: Towards Comprehensive Visual Metaphors Research
Viaarxiv icon

Improving Policy Optimization with Generalist-Specialist Learning

Add code
Jun 26, 2022
Figure 1 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 2 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 3 for Improving Policy Optimization with Generalist-Specialist Learning
Figure 4 for Improving Policy Optimization with Generalist-Specialist Learning
Viaarxiv icon

Learning to Act with Affordance-Aware Multimodal Neural SLAM

Add code
Feb 04, 2022
Figure 1 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 2 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 3 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Figure 4 for Learning to Act with Affordance-Aware Multimodal Neural SLAM
Viaarxiv icon

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Add code
Nov 16, 2021
Figure 1 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 2 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 3 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Figure 4 for TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance
Viaarxiv icon

LUMINOUS: Indoor Scene Generation for Embodied AI Challenges

Add code
Nov 10, 2021
Figure 1 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 2 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 3 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Figure 4 for LUMINOUS: Indoor Scene Generation for Embodied AI Challenges
Viaarxiv icon

IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition

Add code
Aug 13, 2021
Figure 1 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 2 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 3 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Figure 4 for IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Viaarxiv icon

ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills

Add code
Aug 09, 2021
Figure 1 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 2 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 3 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Figure 4 for ManiSkill: Learning-from-Demonstrations Benchmark for Generalizable Manipulation Skills
Viaarxiv icon

Tracking Based Semi-Automatic Annotation for Scene Text Videos

Add code
Mar 29, 2021
Figure 1 for Tracking Based Semi-Automatic Annotation for Scene Text Videos
Figure 2 for Tracking Based Semi-Automatic Annotation for Scene Text Videos
Figure 3 for Tracking Based Semi-Automatic Annotation for Scene Text Videos
Figure 4 for Tracking Based Semi-Automatic Annotation for Scene Text Videos
Viaarxiv icon