Picture for Hengduo Li

Hengduo Li

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Add code
Apr 08, 2024
Figure 1 for MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Figure 2 for MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Figure 3 for MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Figure 4 for MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Viaarxiv icon

Object Recognition as Next Token Prediction

Add code
Dec 04, 2023
Figure 1 for Object Recognition as Next Token Prediction
Figure 2 for Object Recognition as Next Token Prediction
Figure 3 for Object Recognition as Next Token Prediction
Figure 4 for Object Recognition as Next Token Prediction
Viaarxiv icon

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

Add code
Nov 24, 2023
Viaarxiv icon

BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning

May 22, 2023
Figure 1 for BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning
Figure 2 for BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning
Figure 3 for BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning
Figure 4 for BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning
Viaarxiv icon

Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors

Add code
Sep 30, 2022
Figure 1 for Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
Figure 2 for Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
Figure 3 for Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
Figure 4 for Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
Viaarxiv icon

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

Nov 30, 2021
Figure 1 for AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Figure 2 for AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Figure 3 for AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Figure 4 for AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Viaarxiv icon

Efficient Video Transformers with Spatial-Temporal Token Selection

Add code
Nov 23, 2021
Figure 1 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 2 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 3 for Efficient Video Transformers with Spatial-Temporal Token Selection
Figure 4 for Efficient Video Transformers with Spatial-Temporal Token Selection
Viaarxiv icon

Rethinking Pseudo Labels for Semi-Supervised Object Detection

Jun 01, 2021
Figure 1 for Rethinking Pseudo Labels for Semi-Supervised Object Detection
Figure 2 for Rethinking Pseudo Labels for Semi-Supervised Object Detection
Figure 3 for Rethinking Pseudo Labels for Semi-Supervised Object Detection
Figure 4 for Rethinking Pseudo Labels for Semi-Supervised Object Detection
Viaarxiv icon

HMS: Hierarchical Modality Selection for Efficient Video Recognition

Apr 21, 2021
Figure 1 for HMS: Hierarchical Modality Selection for Efficient Video Recognition
Figure 2 for HMS: Hierarchical Modality Selection for Efficient Video Recognition
Figure 3 for HMS: Hierarchical Modality Selection for Efficient Video Recognition
Figure 4 for HMS: Hierarchical Modality Selection for Efficient Video Recognition
Viaarxiv icon

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

Dec 29, 2020
Figure 1 for 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Figure 2 for 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Figure 3 for 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Figure 4 for 2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition
Viaarxiv icon