Picture for Yaowei Wang

Yaowei Wang

Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

Add code
Jan 07, 2025
Viaarxiv icon

Towards Visual Grounding: A Survey

Add code
Dec 28, 2024
Viaarxiv icon

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

Add code
Dec 28, 2024
Viaarxiv icon

Core Context Aware Attention for Long Context Language Modeling

Add code
Dec 17, 2024
Viaarxiv icon

Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization

Add code
Dec 13, 2024
Figure 1 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 2 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 3 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Figure 4 for Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization
Viaarxiv icon

Towards Long Video Understanding via Fine-detailed Video Story Generation

Add code
Dec 09, 2024
Figure 1 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 2 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 3 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Figure 4 for Towards Long Video Understanding via Fine-detailed Video Story Generation
Viaarxiv icon

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Add code
Dec 07, 2024
Viaarxiv icon

CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs

Add code
Nov 19, 2024
Figure 1 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 2 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 3 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Figure 4 for CATCH: Complementary Adaptive Token-level Contrastive Decoding to Mitigate Hallucinations in LVLMs
Viaarxiv icon

OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Add code
Oct 10, 2024
Figure 1 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 2 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 3 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Figure 4 for OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Viaarxiv icon

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

Add code
Oct 08, 2024
Figure 1 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 2 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 3 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Figure 4 for EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Viaarxiv icon