Picture for Lewei Yao

Lewei Yao

AEQ-Bench: Measuring Empathy of Omni-Modal Large Models

Add code
Jan 15, 2026
Viaarxiv icon

InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search

Add code
Dec 21, 2025
Viaarxiv icon

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Add code
Jul 10, 2025
Figure 1 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 2 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 3 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Figure 4 for The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
Viaarxiv icon

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Add code
Apr 14, 2024
Figure 1 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 2 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 3 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Figure 4 for DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Viaarxiv icon

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Add code
Mar 07, 2024
Figure 1 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 2 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 3 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Figure 4 for PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Viaarxiv icon

PerceptionGPT: Effectively Fusing Visual Perception into LLM

Add code
Nov 11, 2023
Figure 1 for PerceptionGPT: Effectively Fusing Visual Perception into LLM
Figure 2 for PerceptionGPT: Effectively Fusing Visual Perception into LLM
Figure 3 for PerceptionGPT: Effectively Fusing Visual Perception into LLM
Figure 4 for PerceptionGPT: Effectively Fusing Visual Perception into LLM
Viaarxiv icon

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Add code
Oct 16, 2023
Figure 1 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 2 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 3 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Figure 4 for PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Viaarxiv icon

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

Add code
Jul 04, 2023
Figure 1 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 2 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 3 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Figure 4 for DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
Viaarxiv icon

DetGPT: Detect What You Need via Reasoning

Add code
May 24, 2023
Figure 1 for DetGPT: Detect What You Need via Reasoning
Figure 2 for DetGPT: Detect What You Need via Reasoning
Figure 3 for DetGPT: Detect What You Need via Reasoning
Figure 4 for DetGPT: Detect What You Need via Reasoning
Viaarxiv icon

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning

Add code
May 04, 2023
Figure 1 for DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Figure 2 for DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Figure 3 for DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Figure 4 for DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Viaarxiv icon