Picture for Ziyi Lin

Ziyi Lin

Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate

Add code
Apr 23, 2025
Viaarxiv icon

TerDiT: Ternary Diffusion Models with Transformers

Add code
May 23, 2024
Figure 1 for TerDiT: Ternary Diffusion Models with Transformers
Figure 2 for TerDiT: Ternary Diffusion Models with Transformers
Figure 3 for TerDiT: Ternary Diffusion Models with Transformers
Figure 4 for TerDiT: Ternary Diffusion Models with Transformers
Viaarxiv icon

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Add code
May 09, 2024
Figure 1 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 2 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 3 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Figure 4 for Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Viaarxiv icon

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Add code
Feb 08, 2024
Viaarxiv icon

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model

Add code
Nov 29, 2023
Figure 1 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 2 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 3 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 4 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Viaarxiv icon

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Add code
Nov 13, 2023
Viaarxiv icon

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Add code
Jun 15, 2023
Viaarxiv icon

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Add code
Apr 28, 2023
Viaarxiv icon

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Add code
Mar 09, 2023
Viaarxiv icon

Frozen CLIP Models are Efficient Video Learners

Add code
Aug 06, 2022
Figure 1 for Frozen CLIP Models are Efficient Video Learners
Figure 2 for Frozen CLIP Models are Efficient Video Learners
Figure 3 for Frozen CLIP Models are Efficient Video Learners
Figure 4 for Frozen CLIP Models are Efficient Video Learners
Viaarxiv icon