Picture for Yen-Chun Chen

Yen-Chun Chen

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Add code
Jul 19, 2024
Viaarxiv icon

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Add code
Jun 27, 2024
Figure 1 for ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Figure 2 for ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Figure 3 for ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Figure 4 for ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Viaarxiv icon

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Add code
May 26, 2024
Figure 1 for Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Figure 2 for Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Figure 3 for Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Figure 4 for Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Viaarxiv icon

iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

Add code
Dec 28, 2023
Viaarxiv icon

LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Add code
Oct 18, 2023
Figure 1 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 2 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 3 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Figure 4 for LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following
Viaarxiv icon

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Add code
May 31, 2023
Figure 1 for Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Figure 2 for Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Figure 3 for Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Figure 4 for Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Viaarxiv icon

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Add code
May 27, 2023
Figure 1 for Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Figure 2 for Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Figure 3 for Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Figure 4 for Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Viaarxiv icon

Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Add code
Aug 29, 2022
Figure 1 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 2 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 3 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Figure 4 for Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Viaarxiv icon

GLIPv2: Unifying Localization and Vision-Language Understanding

Add code
Jun 12, 2022
Figure 1 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 2 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 3 for GLIPv2: Unifying Localization and Vision-Language Understanding
Figure 4 for GLIPv2: Unifying Localization and Vision-Language Understanding
Viaarxiv icon

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Add code
Apr 28, 2022
Figure 1 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 2 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 3 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Figure 4 for Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Viaarxiv icon