Picture for Xinyu Wei

Xinyu Wei

Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning

Add code
Aug 25, 2025
Viaarxiv icon

Retrieval Feedback Memory Enhancement Large Model Retrieval Generation Method

Add code
Aug 25, 2025
Viaarxiv icon

CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation

Add code
Aug 25, 2025
Viaarxiv icon

Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning

Add code
Aug 11, 2025
Viaarxiv icon

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Viaarxiv icon

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Add code
May 22, 2025
Viaarxiv icon

Are Large Language Models Good In-context Learners for Financial Sentiment Analysis?

Add code
Mar 06, 2025
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Figure 1 for MAVIS: Mathematical Visual Instruction Tuning
Figure 2 for MAVIS: Mathematical Visual Instruction Tuning
Figure 3 for MAVIS: Mathematical Visual Instruction Tuning
Figure 4 for MAVIS: Mathematical Visual Instruction Tuning
Viaarxiv icon

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

Add code
Jun 22, 2024
Figure 1 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 2 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 3 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 4 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Viaarxiv icon

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Add code
Apr 01, 2024
Figure 1 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 2 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 3 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 4 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Viaarxiv icon