Alert button
Picture for Zineng Tang

Zineng Tang

Alert button

CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Add code
Bookmark button
Alert button
Nov 30, 2023
Zineng Tang, Ziyi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit Bansal

Viaarxiv icon

Paxion: Patching Action Knowledge in Video-Language Foundation Models

Add code
Bookmark button
Alert button
May 26, 2023
Zhenhailong Wang, Ansel Blume, Sha Li, Genglin Liu, Jaemin Cho, Zineng Tang, Mohit Bansal, Heng Ji

Figure 1 for Paxion: Patching Action Knowledge in Video-Language Foundation Models
Figure 2 for Paxion: Patching Action Knowledge in Video-Language Foundation Models
Figure 3 for Paxion: Patching Action Knowledge in Video-Language Foundation Models
Figure 4 for Paxion: Patching Action Knowledge in Video-Language Foundation Models
Viaarxiv icon

Any-to-Any Generation via Composable Diffusion

Add code
Bookmark button
Alert button
May 19, 2023
Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

Figure 1 for Any-to-Any Generation via Composable Diffusion
Figure 2 for Any-to-Any Generation via Composable Diffusion
Figure 3 for Any-to-Any Generation via Composable Diffusion
Figure 4 for Any-to-Any Generation via Composable Diffusion
Viaarxiv icon

Unifying Vision, Text, and Layout for Universal Document Processing

Add code
Bookmark button
Alert button
Dec 20, 2022
Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

Figure 1 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 2 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 3 for Unifying Vision, Text, and Layout for Universal Document Processing
Figure 4 for Unifying Vision, Text, and Layout for Universal Document Processing
Viaarxiv icon

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

Add code
Bookmark button
Alert button
Nov 21, 2022
Zineng Tang, Jaemin Cho, Jie Lei, Mohit Bansal

Figure 1 for Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Figure 2 for Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Figure 3 for Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Figure 4 for Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Viaarxiv icon

TVLT: Textless Vision-Language Transformer

Add code
Bookmark button
Alert button
Sep 28, 2022
Zineng Tang, Jaemin Cho, Yixin Nie, Mohit Bansal

Figure 1 for TVLT: Textless Vision-Language Transformer
Figure 2 for TVLT: Textless Vision-Language Transformer
Figure 3 for TVLT: Textless Vision-Language Transformer
Figure 4 for TVLT: Textless Vision-Language Transformer
Viaarxiv icon

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Add code
Bookmark button
Alert button
Jul 06, 2021
Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal

Figure 1 for VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Figure 2 for VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Figure 3 for VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Figure 4 for VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Viaarxiv icon

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Add code
Bookmark button
Alert button
May 13, 2020
Hyounghun Kim, Zineng Tang, Mohit Bansal

Figure 1 for Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Figure 2 for Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Figure 3 for Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Figure 4 for Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Viaarxiv icon