Picture for Sijie Zhu

Sijie Zhu

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Add code
Jun 15, 2024
Viaarxiv icon

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Add code
May 09, 2024
Figure 1 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 2 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 3 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 4 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Viaarxiv icon

Edit3K: Universal Representation Learning for Video Editing Components

Add code
Mar 24, 2024
Viaarxiv icon

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

Add code
Apr 06, 2023
Figure 1 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 2 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 3 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 4 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Viaarxiv icon

TopNet: Transformer-based Object Placement Network for Image Compositing

Add code
Apr 06, 2023
Figure 1 for TopNet: Transformer-based Object Placement Network for Image Compositing
Figure 2 for TopNet: Transformer-based Object Placement Network for Image Compositing
Figure 3 for TopNet: Transformer-based Object Placement Network for Image Compositing
Figure 4 for TopNet: Transformer-based Object Placement Network for Image Compositing
Viaarxiv icon

GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing

Add code
Mar 31, 2022
Figure 1 for GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
Figure 2 for GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
Figure 3 for GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
Figure 4 for GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
Viaarxiv icon

TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization

Add code
Mar 31, 2022
Figure 1 for TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Figure 2 for TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Figure 3 for TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Figure 4 for TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
Viaarxiv icon

BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

Add code
May 16, 2021
Figure 1 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
Figure 2 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
Figure 3 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
Figure 4 for BDANet: Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
Viaarxiv icon

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

Add code
May 14, 2021
Figure 1 for MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Figure 2 for MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Figure 3 for MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Figure 4 for MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Viaarxiv icon

3D Human Pose Estimation with Spatial and Temporal Transformers

Add code
Mar 24, 2021
Figure 1 for 3D Human Pose Estimation with Spatial and Temporal Transformers
Figure 2 for 3D Human Pose Estimation with Spatial and Temporal Transformers
Figure 3 for 3D Human Pose Estimation with Spatial and Temporal Transformers
Figure 4 for 3D Human Pose Estimation with Spatial and Temporal Transformers
Viaarxiv icon