Picture for Linchao Zhu

Linchao Zhu

Efficient Multimodal Fusion via Interactive Prompting

Add code
Apr 13, 2023
Viaarxiv icon

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Add code
Mar 06, 2023
Viaarxiv icon

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

Add code
Jan 22, 2023
Viaarxiv icon

Temporal Perceiving Video-Language Pre-training

Add code
Jan 18, 2023
Viaarxiv icon

Discriminative Radial Domain Adaptation

Add code
Jan 01, 2023
Viaarxiv icon

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

Add code
Dec 19, 2022
Viaarxiv icon

Slimmable Networks for Contrastive Self-supervised Learning

Add code
Sep 30, 2022
Figure 1 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 2 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 3 for Slimmable Networks for Contrastive Self-supervised Learning
Figure 4 for Slimmable Networks for Contrastive Self-supervised Learning
Viaarxiv icon

AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement

Add code
Aug 06, 2022
Figure 1 for AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement
Figure 2 for AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement
Figure 3 for AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement
Figure 4 for AFE-CNN: 3D Skeleton-based Action Recognition with Action Feature Enhancement
Viaarxiv icon

Fine-Grained Semantically Aligned Vision-Language Pre-Training

Add code
Aug 04, 2022
Figure 1 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 2 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 3 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Figure 4 for Fine-Grained Semantically Aligned Vision-Language Pre-Training
Viaarxiv icon

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Add code
Aug 03, 2022
Figure 1 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 2 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 3 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Figure 4 for Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Viaarxiv icon