Picture for Rui Zhao

Rui Zhao

Department of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation

Add code
Feb 22, 2025
Figure 1 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 2 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 3 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 4 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Viaarxiv icon

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

Add code
Feb 21, 2025
Figure 1 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 2 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 3 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 4 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Viaarxiv icon

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

Add code
Feb 19, 2025
Figure 1 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 2 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 3 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 4 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Viaarxiv icon

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Add code
Jan 01, 2025
Figure 1 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 2 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 3 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 4 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing

Add code
Dec 16, 2024
Figure 1 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 2 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 3 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 4 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Viaarxiv icon

RemDet: Rethinking Efficient Model Design for UAV Object Detection

Add code
Dec 13, 2024
Figure 1 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 2 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 3 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 4 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon

Representation Purification for End-to-End Speech Translation

Add code
Dec 05, 2024
Figure 1 for Representation Purification for End-to-End Speech Translation
Figure 2 for Representation Purification for End-to-End Speech Translation
Figure 3 for Representation Purification for End-to-End Speech Translation
Figure 4 for Representation Purification for End-to-End Speech Translation
Viaarxiv icon

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Add code
Dec 02, 2024
Viaarxiv icon