Picture for Hongbin Zhou

Hongbin Zhou

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Add code
Jun 14, 2024
Viaarxiv icon

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Add code
Feb 19, 2024
Viaarxiv icon

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

Add code
Feb 06, 2024
Viaarxiv icon

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

Add code
Oct 12, 2023
Viaarxiv icon

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Add code
Oct 08, 2023
Viaarxiv icon

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

Add code
Sep 17, 2023
Figure 1 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 2 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 3 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Figure 4 for PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts
Viaarxiv icon

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

Add code
Jul 29, 2023
Figure 1 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 2 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 3 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Figure 4 for METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer
Viaarxiv icon

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

Add code
Jun 09, 2023
Figure 1 for DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
Figure 2 for DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
Figure 3 for DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
Figure 4 for DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
Viaarxiv icon

MASNet:Improve Performance of Siamese Networks with Mutual-attention for Remote Sensing Change Detection Tasks

Add code
Jun 06, 2022
Figure 1 for MASNet:Improve Performance of Siamese Networks with Mutual-attention for Remote Sensing Change Detection Tasks
Figure 2 for MASNet:Improve Performance of Siamese Networks with Mutual-attention for Remote Sensing Change Detection Tasks
Figure 3 for MASNet:Improve Performance of Siamese Networks with Mutual-attention for Remote Sensing Change Detection Tasks
Figure 4 for MASNet:Improve Performance of Siamese Networks with Mutual-attention for Remote Sensing Change Detection Tasks
Viaarxiv icon