Picture for Jialong Zuo

Jialong Zuo

Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets

Add code
Dec 19, 2025
Viaarxiv icon

WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World

Add code
Dec 11, 2025
Viaarxiv icon

Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment

Add code
Nov 13, 2025
Viaarxiv icon

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Add code
Oct 14, 2025
Figure 1 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 2 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 3 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 4 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Viaarxiv icon

ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model

Add code
Jun 11, 2025
Figure 1 for ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Figure 2 for ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Figure 3 for ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Figure 4 for ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
Viaarxiv icon

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Add code
May 14, 2025
Viaarxiv icon

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Add code
Feb 26, 2025
Figure 1 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 2 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 3 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 4 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Viaarxiv icon

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model

Add code
Feb 08, 2025
Figure 1 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 2 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 3 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 4 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Viaarxiv icon

Speech Watermarking with Discrete Intermediate Representations

Add code
Dec 18, 2024
Viaarxiv icon