Picture for Alexander G. Hauptmann

Alexander G. Hauptmann

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Add code
Jun 28, 2024
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Viaarxiv icon

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

Add code
Apr 29, 2024
Figure 1 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 2 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 3 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Figure 4 for MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis
Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Oct 09, 2023
Figure 1 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 2 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 3 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Figure 4 for Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Viaarxiv icon

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Add code
Jul 03, 2023
Figure 1 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 2 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 3 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 4 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Viaarxiv icon

Document Entity Retrieval with Massive and Noisy Pre-training

Add code
Jun 15, 2023
Figure 1 for Document Entity Retrieval with Massive and Noisy Pre-training
Figure 2 for Document Entity Retrieval with Massive and Noisy Pre-training
Figure 3 for Document Entity Retrieval with Massive and Noisy Pre-training
Figure 4 for Document Entity Retrieval with Massive and Noisy Pre-training
Viaarxiv icon

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules

Add code
Apr 05, 2023
Figure 1 for ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
Figure 2 for ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
Figure 3 for ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
Figure 4 for ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
Viaarxiv icon

MAGVIT: Masked Generative Video Transformer

Add code
Dec 10, 2022
Figure 1 for MAGVIT: Masked Generative Video Transformer
Figure 2 for MAGVIT: Masked Generative Video Transformer
Figure 3 for MAGVIT: Masked Generative Video Transformer
Figure 4 for MAGVIT: Masked Generative Video Transformer
Viaarxiv icon

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

Add code
Jun 10, 2022
Figure 1 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 2 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 3 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Figure 4 for Rethinking Spatial Invariance of Convolutional Networks for Object Counting
Viaarxiv icon

Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals

Add code
Jan 14, 2022
Figure 1 for Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals
Figure 2 for Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals
Figure 3 for Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals
Figure 4 for Argus++: Robust Real-time Activity Detection for Unconstrained Video Streams with Overlapping Cube Proposals
Viaarxiv icon