Picture for Yanxin Long

Yanxin Long

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Add code
May 14, 2024
Figure 1 for Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Figure 2 for Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Figure 3 for Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Figure 4 for Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Viaarxiv icon

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

Add code
Mar 13, 2024
Figure 1 for DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Figure 2 for DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Figure 3 for DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Figure 4 for DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Viaarxiv icon

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Add code
Mar 09, 2024
Figure 1 for Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
Figure 2 for Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
Figure 3 for Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
Figure 4 for Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
Viaarxiv icon

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Add code
Mar 15, 2023
Figure 1 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 2 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 3 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 4 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Viaarxiv icon

NLIP: Noise-robust Language-Image Pre-training

Add code
Jan 04, 2023
Figure 1 for NLIP: Noise-robust Language-Image Pre-training
Figure 2 for NLIP: Noise-robust Language-Image Pre-training
Figure 3 for NLIP: Noise-robust Language-Image Pre-training
Figure 4 for NLIP: Noise-robust Language-Image Pre-training
Viaarxiv icon

P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

Add code
Nov 02, 2022
Figure 1 for P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Figure 2 for P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Figure 3 for P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Figure 4 for P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Viaarxiv icon

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

Add code
Jul 23, 2021
Figure 1 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 2 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 3 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 4 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Viaarxiv icon