Picture for Alexander Hauptmann

Alexander Hauptmann

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Add code
Jul 18, 2024
Viaarxiv icon

Multimodal Reranking for Knowledge-Intensive Visual Question Answering

Add code
Jul 17, 2024
Viaarxiv icon

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Add code
Jun 17, 2024
Figure 1 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 2 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 3 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Figure 4 for Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Viaarxiv icon

Learning Visual-Semantic Subspace Representations for Propositional Reasoning

Add code
May 25, 2024
Viaarxiv icon

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Add code
Apr 02, 2024
Figure 1 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 2 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 3 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 4 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Viaarxiv icon

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin

Add code
Sep 18, 2023
Figure 1 for Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Figure 2 for Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Figure 3 for Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Figure 4 for Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin
Viaarxiv icon

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Add code
Mar 31, 2023
Figure 1 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 2 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 3 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 4 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Viaarxiv icon

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

Add code
Sep 01, 2022
Figure 1 for GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Figure 2 for GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Figure 3 for GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Figure 4 for GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Viaarxiv icon

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

Add code
Apr 15, 2021
Figure 1 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 2 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 3 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Figure 4 for Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Viaarxiv icon

Spatial-Temporal Alignment Network for Action Recognition and Detection

Add code
Dec 04, 2020
Figure 1 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 2 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 3 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Figure 4 for Spatial-Temporal Alignment Network for Action Recognition and Detection
Viaarxiv icon