Picture for Yong Jae Lee

Yong Jae Lee

LASER: Lip Landmark Assisted Speaker Detection for Robustness

Add code
Jan 21, 2025
Viaarxiv icon

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Add code
Jan 08, 2025
Figure 1 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 2 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 3 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Figure 4 for Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Viaarxiv icon

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Add code
Oct 15, 2024
Figure 1 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 2 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 3 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Figure 4 for TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Viaarxiv icon

On the Effectiveness of Dataset Alignment for Fake Image Detection

Add code
Oct 15, 2024
Figure 1 for On the Effectiveness of Dataset Alignment for Fake Image Detection
Figure 2 for On the Effectiveness of Dataset Alignment for Fake Image Detection
Figure 3 for On the Effectiveness of Dataset Alignment for Fake Image Detection
Figure 4 for On the Effectiveness of Dataset Alignment for Fake Image Detection
Viaarxiv icon

Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

Add code
Oct 03, 2024
Figure 1 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 2 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 3 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Figure 4 for Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos
Viaarxiv icon

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Add code
Oct 01, 2024
Figure 1 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 2 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 3 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Figure 4 for Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Viaarxiv icon

Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds

Add code
Sep 10, 2024
Figure 1 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 2 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 3 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Figure 4 for Cross-Modal Self-Supervised Learning with Effective Contrastive Units for LiDAR Point Clouds
Viaarxiv icon

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

Add code
Jul 15, 2024
Figure 1 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 2 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 3 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Figure 4 for VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Viaarxiv icon

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Add code
Jun 28, 2024
Figure 1 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 2 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 3 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Figure 4 for LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Viaarxiv icon

Yo'LLaVA: Your Personalized Language and Vision Assistant

Add code
Jun 13, 2024
Figure 1 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 2 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 3 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Figure 4 for Yo'LLaVA: Your Personalized Language and Vision Assistant
Viaarxiv icon