Picture for Lijuan Wang

Lijuan Wang

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Add code
Mar 19, 2024
Figure 1 for Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Figure 2 for Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Figure 3 for Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Figure 4 for Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Viaarxiv icon

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Add code
Jan 30, 2024
Viaarxiv icon

Bring Metric Functions into Diffusion Models

Add code
Jan 04, 2024
Figure 1 for Bring Metric Functions into Diffusion Models
Figure 2 for Bring Metric Functions into Diffusion Models
Figure 3 for Bring Metric Functions into Diffusion Models
Figure 4 for Bring Metric Functions into Diffusion Models
Viaarxiv icon

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Add code
Jan 01, 2024
Viaarxiv icon

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

Add code
Dec 21, 2023
Viaarxiv icon

Interfacing Foundation Models' Embeddings

Add code
Dec 12, 2023
Figure 1 for Interfacing Foundation Models' Embeddings
Figure 2 for Interfacing Foundation Models' Embeddings
Figure 3 for Interfacing Foundation Models' Embeddings
Figure 4 for Interfacing Foundation Models' Embeddings
Viaarxiv icon

Segment and Caption Anything

Add code
Dec 01, 2023
Figure 1 for Segment and Caption Anything
Figure 2 for Segment and Caption Anything
Figure 3 for Segment and Caption Anything
Figure 4 for Segment and Caption Anything
Viaarxiv icon

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

Add code
Nov 29, 2023
Figure 1 for MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Figure 2 for MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Figure 3 for MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Figure 4 for MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Figure 1 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 2 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 3 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Figure 4 for GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Viaarxiv icon

MM-VID: Advancing Video Understanding with GPT-4V

Add code
Oct 30, 2023
Figure 1 for MM-VID: Advancing Video Understanding with GPT-4V
Figure 2 for MM-VID: Advancing Video Understanding with GPT-4V
Figure 3 for MM-VID: Advancing Video Understanding with GPT-4V
Figure 4 for MM-VID: Advancing Video Understanding with GPT-4V
Viaarxiv icon