Picture for Hexiang Hu

Hexiang Hu

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

Add code
Jun 19, 2024
Figure 1 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 2 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 3 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Figure 4 for Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Viaarxiv icon

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Add code
Mar 28, 2024
Figure 1 for MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Figure 2 for MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Figure 3 for MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Figure 4 for MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Viaarxiv icon

Instruct-Imagen: Image Generation with Multi-modal Instruction

Add code
Jan 03, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Add code
Nov 28, 2023
Figure 1 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 2 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 3 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 4 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Viaarxiv icon

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Add code
May 31, 2023
Figure 1 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 2 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 3 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Figure 4 for From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces
Viaarxiv icon

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Add code
May 29, 2023
Figure 1 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 2 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 3 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Figure 4 for PaLI-X: On Scaling up a Multilingual Vision and Language Model
Viaarxiv icon

Subject-driven Text-to-Image Generation via Apprenticeship Learning

Add code
Apr 14, 2023
Figure 1 for Subject-driven Text-to-Image Generation via Apprenticeship Learning
Figure 2 for Subject-driven Text-to-Image Generation via Apprenticeship Learning
Figure 3 for Subject-driven Text-to-Image Generation via Apprenticeship Learning
Figure 4 for Subject-driven Text-to-Image Generation via Apprenticeship Learning
Viaarxiv icon

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Add code
Feb 24, 2023
Figure 1 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 2 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 3 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Figure 4 for Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Viaarxiv icon

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Add code
Feb 24, 2023
Figure 1 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 2 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 3 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Figure 4 for Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
Viaarxiv icon