Picture for Jianhua Han

Jianhua Han

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

Add code
Jul 11, 2024
Figure 1 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 2 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 3 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Figure 4 for HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Viaarxiv icon

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Add code
Jul 09, 2024
Viaarxiv icon

DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection

Add code
Apr 14, 2024
Viaarxiv icon

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model

Add code
Mar 18, 2024
Figure 1 for LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Figure 2 for LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Figure 3 for LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Figure 4 for LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Viaarxiv icon

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

Add code
Mar 12, 2024
Figure 1 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 2 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 3 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Figure 4 for NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Viaarxiv icon

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs

Add code
Feb 28, 2024
Figure 1 for From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Figure 2 for From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Figure 3 for From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Figure 4 for From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Viaarxiv icon

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Add code
Feb 08, 2024
Viaarxiv icon

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Add code
Dec 29, 2023
Viaarxiv icon

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Add code
Dec 18, 2023
Viaarxiv icon

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

Add code
Dec 06, 2023
Figure 1 for Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Figure 2 for Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Figure 3 for Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Figure 4 for Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Viaarxiv icon