Picture for Xinyu Wei

Xinyu Wei

Retrieval Feedback Memory Enhancement Large Model Retrieval Generation Method

Add code
Aug 25, 2025
Viaarxiv icon

Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning

Add code
Aug 25, 2025
Viaarxiv icon

CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation

Add code
Aug 25, 2025
Viaarxiv icon

Separation and Collaboration: Two-Level Routing Grouped Mixture-of-Experts for Multi-Domain Continual Learning

Add code
Aug 11, 2025
Viaarxiv icon

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Figure 1 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 2 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 3 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Figure 4 for Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
Viaarxiv icon

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Add code
May 22, 2025
Viaarxiv icon

Are Large Language Models Good In-context Learners for Financial Sentiment Analysis?

Add code
Mar 06, 2025
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Figure 1 for MAVIS: Mathematical Visual Instruction Tuning
Figure 2 for MAVIS: Mathematical Visual Instruction Tuning
Figure 3 for MAVIS: Mathematical Visual Instruction Tuning
Figure 4 for MAVIS: Mathematical Visual Instruction Tuning
Viaarxiv icon

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

Add code
Jun 22, 2024
Figure 1 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 2 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 3 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Figure 4 for MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception
Viaarxiv icon

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Add code
Apr 01, 2024
Figure 1 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 2 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 3 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Figure 4 for Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Viaarxiv icon