Picture for Rui Zhao

Rui Zhao

Department of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Add code
Mar 05, 2025
Figure 1 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 2 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 3 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Figure 4 for DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Viaarxiv icon

Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation

Add code
Feb 22, 2025
Figure 1 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 2 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 3 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Figure 4 for Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation
Viaarxiv icon

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

Add code
Feb 21, 2025
Figure 1 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 2 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 3 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Figure 4 for PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
Viaarxiv icon

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

Add code
Feb 19, 2025
Figure 1 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 2 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 3 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Figure 4 for Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning
Viaarxiv icon

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Add code
Jan 01, 2025
Figure 1 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 2 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 3 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 4 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing

Add code
Dec 16, 2024
Figure 1 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 2 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 3 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Figure 4 for "They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing
Viaarxiv icon

RemDet: Rethinking Efficient Model Design for UAV Object Detection

Add code
Dec 13, 2024
Figure 1 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 2 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 3 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 4 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon

Representation Purification for End-to-End Speech Translation

Add code
Dec 05, 2024
Figure 1 for Representation Purification for End-to-End Speech Translation
Figure 2 for Representation Purification for End-to-End Speech Translation
Figure 3 for Representation Purification for End-to-End Speech Translation
Figure 4 for Representation Purification for End-to-End Speech Translation
Viaarxiv icon