Picture for Caren Han

Caren Han

Location-Aware Pretraining for Medical Difference Visual Question Answering

Add code
Mar 05, 2026
Viaarxiv icon

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

Add code
Mar 01, 2026
Viaarxiv icon

FiLoRA: Focus-and-Ignore LoRA for Controllable Feature Reliance

Add code
Feb 02, 2026
Viaarxiv icon

'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue

Add code
Oct 31, 2024
Figure 1 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 2 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 3 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Figure 4 for 'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue
Viaarxiv icon

ChuLo: Chunk-Level Key Information Representation for Long Document Processing

Add code
Oct 14, 2024
Figure 1 for ChuLo: Chunk-Level Key Information Representation for Long Document Processing
Figure 2 for ChuLo: Chunk-Level Key Information Representation for Long Document Processing
Figure 3 for ChuLo: Chunk-Level Key Information Representation for Long Document Processing
Figure 4 for ChuLo: Chunk-Level Key Information Representation for Long Document Processing
Viaarxiv icon

GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning

Add code
Oct 12, 2024
Figure 1 for GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Figure 2 for GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Figure 3 for GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Figure 4 for GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Viaarxiv icon

Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

Add code
May 24, 2024
Figure 1 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Figure 2 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Figure 3 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Figure 4 for Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Viaarxiv icon

Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset

Add code
Apr 30, 2024
Figure 1 for Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Figure 2 for Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Figure 3 for Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Figure 4 for Game-MUG: Multimodal Oriented Game Situation Understanding and Commentary Generation Dataset
Viaarxiv icon

PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure

Add code
Apr 21, 2024
Figure 1 for PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure
Figure 2 for PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure
Figure 3 for PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure
Figure 4 for PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure
Viaarxiv icon

M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding

Add code
Feb 28, 2024
Figure 1 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding
Figure 2 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding
Figure 3 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding
Figure 4 for M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding
Viaarxiv icon