Picture for Cong Wei

Cong Wei

MANTIS: Interleaved Multi-Image Instruction Tuning

Add code
May 02, 2024
Viaarxiv icon

LaSagnA: Language-based Segmentation Assistant for Complex Queries

Add code
Apr 12, 2024
Viaarxiv icon

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

Add code
Mar 22, 2024
Figure 1 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 2 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 3 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Figure 4 for AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks
Viaarxiv icon

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

Add code
Feb 06, 2024
Figure 1 for ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Figure 2 for ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Figure 3 for ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Figure 4 for ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Viaarxiv icon

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

Add code
Dec 22, 2023
Viaarxiv icon

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

Add code
Nov 28, 2023
Figure 1 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 2 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 3 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Figure 4 for UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Viaarxiv icon

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Add code
Nov 27, 2023
Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Viaarxiv icon

DreamEdit: Subject-driven Image Editing

Add code
Jun 22, 2023
Figure 1 for DreamEdit: Subject-driven Image Editing
Figure 2 for DreamEdit: Subject-driven Image Editing
Figure 3 for DreamEdit: Subject-driven Image Editing
Figure 4 for DreamEdit: Subject-driven Image Editing
Viaarxiv icon

Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

Add code
Mar 24, 2023
Figure 1 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 2 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 3 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Figure 4 for Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Viaarxiv icon