Picture for Weidi Xie

Weidi Xie

ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval

Add code
Feb 21, 2025
Viaarxiv icon

WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs

Add code
Feb 06, 2025
Figure 1 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 2 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 3 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Figure 4 for WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Viaarxiv icon

Track-On: Transformer-based Online Point Tracking with Memory

Add code
Jan 30, 2025
Figure 1 for Track-On: Transformer-based Online Point Tracking with Memory
Figure 2 for Track-On: Transformer-based Online Point Tracking with Memory
Figure 3 for Track-On: Transformer-based Online Point Tracking with Memory
Figure 4 for Track-On: Transformer-based Online Point Tracking with Memory
Viaarxiv icon

A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis

Add code
Dec 17, 2024
Viaarxiv icon

Can Modern LLMs Act as Agent Cores in Radiology~Environments?

Add code
Dec 12, 2024
Viaarxiv icon

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

Add code
Dec 04, 2024
Figure 1 for MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Figure 2 for MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Figure 3 for MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Figure 4 for MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Viaarxiv icon

Unlocking Video-LLM via Agent-of-Thoughts Distillation

Add code
Dec 02, 2024
Figure 1 for Unlocking Video-LLM via Agent-of-Thoughts Distillation
Figure 2 for Unlocking Video-LLM via Agent-of-Thoughts Distillation
Figure 3 for Unlocking Video-LLM via Agent-of-Thoughts Distillation
Figure 4 for Unlocking Video-LLM via Agent-of-Thoughts Distillation
Viaarxiv icon

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

Add code
Dec 02, 2024
Figure 1 for LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Figure 2 for LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Figure 3 for LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Figure 4 for LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Viaarxiv icon

LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models

Add code
Sep 29, 2024
Figure 1 for LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models
Figure 2 for LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models
Figure 3 for LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models
Figure 4 for LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models
Viaarxiv icon

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

Add code
Aug 26, 2024
Figure 1 for Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Figure 2 for Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Figure 3 for Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Figure 4 for Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Viaarxiv icon