Picture for Longtian Qiu

Longtian Qiu

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Add code
Jan 04, 2024
Figure 1 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 2 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 3 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Figure 4 for Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Viaarxiv icon

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Add code
Nov 13, 2023
Figure 1 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 2 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 3 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 4 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Viaarxiv icon

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Add code
Mar 29, 2023
Figure 1 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 2 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 3 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Figure 4 for HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
Viaarxiv icon

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Add code
Sep 28, 2022
Figure 1 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 2 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 3 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Figure 4 for CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Viaarxiv icon

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

Add code
Dec 04, 2021
Figure 1 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 2 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 3 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Figure 4 for VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Viaarxiv icon