Picture for Yi Bin

Yi Bin

MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

Add code
Aug 08, 2024
Viaarxiv icon

GalleryGPT: Analyzing Paintings with Large Multimodal Models

Add code
Aug 01, 2024
Figure 1 for GalleryGPT: Analyzing Paintings with Large Multimodal Models
Figure 2 for GalleryGPT: Analyzing Paintings with Large Multimodal Models
Figure 3 for GalleryGPT: Analyzing Paintings with Large Multimodal Models
Figure 4 for GalleryGPT: Analyzing Paintings with Large Multimodal Models
Viaarxiv icon

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

Add code
Aug 01, 2024
Figure 1 for Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
Figure 2 for Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
Figure 3 for Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
Figure 4 for Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning
Viaarxiv icon

Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection

Add code
Jul 17, 2024
Figure 1 for Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection
Figure 2 for Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection
Figure 3 for Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection
Figure 4 for Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection
Viaarxiv icon

Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

Add code
Jul 04, 2024
Viaarxiv icon

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Add code
Jun 26, 2024
Viaarxiv icon

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Add code
Jun 09, 2024
Figure 1 for Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Figure 2 for Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Figure 3 for Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Figure 4 for Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Viaarxiv icon

Non-Autoregressive Sentence Ordering

Add code
Oct 19, 2023
Figure 1 for Non-Autoregressive Sentence Ordering
Figure 2 for Non-Autoregressive Sentence Ordering
Figure 3 for Non-Autoregressive Sentence Ordering
Figure 4 for Non-Autoregressive Sentence Ordering
Viaarxiv icon

Solving Math Word Problems with Reexamination

Add code
Oct 14, 2023
Figure 1 for Solving Math Word Problems with Reexamination
Figure 2 for Solving Math Word Problems with Reexamination
Figure 3 for Solving Math Word Problems with Reexamination
Figure 4 for Solving Math Word Problems with Reexamination
Viaarxiv icon

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

Add code
Aug 08, 2023
Figure 1 for Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
Figure 2 for Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
Figure 3 for Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
Figure 4 for Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval
Viaarxiv icon