Picture for Hongsheng Li

Hongsheng Li

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Add code
Feb 10, 2025
Viaarxiv icon

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Add code
Jan 23, 2025
Figure 1 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 2 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 3 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 4 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Viaarxiv icon

IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models

Add code
Jan 23, 2025
Figure 1 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 2 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 3 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Figure 4 for IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Viaarxiv icon

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

Add code
Jan 05, 2025
Figure 1 for GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Figure 2 for GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Figure 3 for GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Figure 4 for GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
Viaarxiv icon

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Add code
Jan 03, 2025
Figure 1 for EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Figure 2 for EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Figure 3 for EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Figure 4 for EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Viaarxiv icon

A3: Android Agent Arena for Mobile GUI Agents

Add code
Jan 02, 2025
Figure 1 for A3: Android Agent Arena for Mobile GUI Agents
Figure 2 for A3: Android Agent Arena for Mobile GUI Agents
Figure 3 for A3: Android Agent Arena for Mobile GUI Agents
Figure 4 for A3: Android Agent Arena for Mobile GUI Agents
Viaarxiv icon

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

Add code
Dec 23, 2024
Figure 1 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 2 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 3 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 4 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Viaarxiv icon

VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping

Add code
Dec 15, 2024
Figure 1 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 2 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 3 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Figure 4 for VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping
Viaarxiv icon

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Add code
Dec 12, 2024
Figure 1 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 2 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 3 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Figure 4 for EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
Viaarxiv icon

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Add code
Dec 12, 2024
Figure 1 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 2 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 3 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Figure 4 for SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Viaarxiv icon