Alert button
Picture for Zhuowen Tu

Zhuowen Tu

Alert button

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

Add code
Bookmark button
Alert button
Sep 20, 2023
Haodong Duan, Mingze Xu, Bing Shuai, Davide Modolo, Zhuowen Tu, Joseph Tighe, Alessandro Bergamo

Figure 1 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 2 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 3 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Figure 4 for SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Viaarxiv icon

Object-Centric Multiple Object Tracking

Add code
Bookmark button
Alert button
Sep 05, 2023
Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Figure 1 for Object-Centric Multiple Object Tracking
Figure 2 for Object-Centric Multiple Object Tracking
Figure 3 for Object-Centric Multiple Object Tracking
Figure 4 for Object-Centric Multiple Object Tracking
Viaarxiv icon

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

Add code
Bookmark button
Alert button
Aug 29, 2023
Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

Figure 1 for Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
Figure 2 for Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
Figure 3 for Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
Figure 4 for Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability
Viaarxiv icon

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Add code
Bookmark button
Alert button
Aug 19, 2023
Wenbo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu

Figure 1 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 2 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 3 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 4 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Viaarxiv icon

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

Add code
Bookmark button
Alert button
Aug 02, 2023
Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu

Figure 1 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 2 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 3 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Figure 4 for Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Viaarxiv icon

Distilling Large Vision-Language Model with Out-of-Distribution Generalizability

Add code
Bookmark button
Alert button
Jul 19, 2023
Xuanlin Li, Yunhao Fang, Minghua Liu, Zhan Ling, Zhuowen Tu, Hao Su

Figure 1 for Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Figure 2 for Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Figure 3 for Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Figure 4 for Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Viaarxiv icon

DocTr: Document Transformer for Structured Information Extraction in Documents

Add code
Bookmark button
Alert button
Jul 16, 2023
Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan

Figure 1 for DocTr: Document Transformer for Structured Information Extraction in Documents
Figure 2 for DocTr: Document Transformer for Structured Information Extraction in Documents
Figure 3 for DocTr: Document Transformer for Structured Information Extraction in Documents
Figure 4 for DocTr: Document Transformer for Structured Information Extraction in Documents
Viaarxiv icon

Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts

Add code
Bookmark button
Alert button
May 11, 2023
Zhaoyang Zhang, Yantao Shen, Kunyu Shi, Zhaowei Cai, Jun Fang, Siqi Deng, Hao Yang, Davide Modolo, Zhuowen Tu, Stefano Soatto

Figure 1 for Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts
Figure 2 for Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts
Figure 3 for Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts
Figure 4 for Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts
Viaarxiv icon