Picture for Ziyi Lin

Ziyi Lin

TerDiT: Ternary Diffusion Models with Transformers

Add code
May 23, 2024
Viaarxiv icon

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model

Add code
Nov 29, 2023
Figure 1 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 2 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 3 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 4 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Viaarxiv icon

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Add code
Nov 13, 2023
Figure 1 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 2 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 3 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Figure 4 for SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Viaarxiv icon

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Add code
Jun 15, 2023
Figure 1 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 2 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 3 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Figure 4 for Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Viaarxiv icon

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Add code
Apr 28, 2023
Figure 1 for LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Figure 2 for LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Figure 3 for LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Figure 4 for LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Viaarxiv icon

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Add code
Mar 09, 2023
Figure 1 for Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Figure 2 for Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Figure 3 for Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Figure 4 for Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Viaarxiv icon

Frozen CLIP Models are Efficient Video Learners

Add code
Aug 06, 2022
Figure 1 for Frozen CLIP Models are Efficient Video Learners
Figure 2 for Frozen CLIP Models are Efficient Video Learners
Figure 3 for Frozen CLIP Models are Efficient Video Learners
Figure 4 for Frozen CLIP Models are Efficient Video Learners
Viaarxiv icon

ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition

Add code
Jun 30, 2022
Figure 1 for ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition
Figure 2 for ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition
Figure 3 for ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition
Figure 4 for ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning for Action Recognition
Viaarxiv icon