Alert button
Picture for Zhan Tong

Zhan Tong

Alert button

Contextual AD Narration with Interleaved Multimodal Sequence

Add code
Bookmark button
Alert button
Mar 19, 2024
Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, Limin Wang

Figure 1 for Contextual AD Narration with Interleaved Multimodal Sequence
Figure 2 for Contextual AD Narration with Interleaved Multimodal Sequence
Figure 3 for Contextual AD Narration with Interleaved Multimodal Sequence
Figure 4 for Contextual AD Narration with Interleaved Multimodal Sequence
Viaarxiv icon

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

Add code
Bookmark button
Alert button
Dec 26, 2023
Qinying Liu, Kecheng Zheng, Wei Wu, Zhan Tong, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

Viaarxiv icon

Bootstrapping SparseFormers from Vision Foundation Models

Add code
Bookmark button
Alert button
Dec 04, 2023
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou

Viaarxiv icon

Advancing Vision Transformers with Group-Mix Attention

Add code
Bookmark button
Alert button
Nov 26, 2023
Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

Viaarxiv icon

Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

Add code
Bookmark button
Alert button
Sep 25, 2023
Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

Figure 1 for Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Figure 2 for Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Figure 3 for Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Figure 4 for Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training
Viaarxiv icon

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

Add code
Bookmark button
Alert button
May 23, 2023
Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan

Figure 1 for TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Figure 2 for TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Figure 3 for TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Figure 4 for TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Viaarxiv icon

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

Add code
Bookmark button
Alert button
Apr 18, 2023
Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao

Figure 1 for VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Figure 2 for VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Figure 3 for VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Figure 4 for VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Viaarxiv icon

Efficient Video Action Detection with Token Dropout and Context Refinement

Add code
Bookmark button
Alert button
Apr 17, 2023
Lei Chen, Zhan Tong, Yibing Song, Gangshan Wu, Limin Wang

Figure 1 for Efficient Video Action Detection with Token Dropout and Context Refinement
Figure 2 for Efficient Video Action Detection with Token Dropout and Context Refinement
Figure 3 for Efficient Video Action Detection with Token Dropout and Context Refinement
Figure 4 for Efficient Video Action Detection with Token Dropout and Context Refinement
Viaarxiv icon

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Add code
Bookmark button
Alert button
Apr 07, 2023
Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou

Figure 1 for SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Figure 2 for SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Figure 3 for SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Figure 4 for SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Viaarxiv icon