Alert button
Picture for Junnan Li

Junnan Li

Alert button

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Nov 30, 2023
Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Viaarxiv icon

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

May 31, 2023
Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi

Figure 1 for CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Figure 2 for CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Figure 3 for CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Figure 4 for CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Viaarxiv icon

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing

May 24, 2023
Dongxu Li, Junnan Li, Steven C. H. Hoi

Figure 1 for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Figure 2 for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Figure 3 for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Figure 4 for BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Viaarxiv icon

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

May 20, 2023
Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

Figure 1 for CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Figure 2 for CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Figure 3 for CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Figure 4 for CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Viaarxiv icon

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

May 18, 2023
Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

Figure 1 for ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Figure 2 for ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Figure 3 for ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Figure 4 for ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Viaarxiv icon

ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding

May 14, 2023
Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

Figure 1 for ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding
Figure 2 for ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding
Figure 3 for ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding
Figure 4 for ULIP-2: Towards Scalable Multimodal Pre-training For 3D Understanding
Viaarxiv icon

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

May 11, 2023
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

Figure 1 for InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Figure 2 for InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Figure 3 for InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Figure 4 for InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Viaarxiv icon

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Jan 30, 2023
Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi

Figure 1 for BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Figure 2 for BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Figure 3 for BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Figure 4 for BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Viaarxiv icon

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Dec 21, 2022
Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao, Steven C. H. Hoi

Figure 1 for From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Figure 2 for From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Figure 3 for From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Figure 4 for From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models
Viaarxiv icon