Picture for Yu Qiao

Yu Qiao

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, SIAT Branch, Shenzhen Institute of Artificial Intelligence and Robotics for Society

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

Add code
Jan 17, 2025
Figure 1 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 2 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 3 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Figure 4 for DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency
Viaarxiv icon

RepVideo: Rethinking Cross-Layer Representation for Video Generation

Add code
Jan 15, 2025
Figure 1 for RepVideo: Rethinking Cross-Layer Representation for Video Generation
Figure 2 for RepVideo: Rethinking Cross-Layer Representation for Video Generation
Figure 3 for RepVideo: Rethinking Cross-Layer Representation for Video Generation
Figure 4 for RepVideo: Rethinking Cross-Layer Representation for Video Generation
Viaarxiv icon

Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes

Add code
Jan 15, 2025
Figure 1 for Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes
Figure 2 for Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes
Figure 3 for Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes
Figure 4 for Mitigating Domain Shift in Federated Learning via Intra- and Inter-Domain Prototypes
Viaarxiv icon

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

Add code
Jan 14, 2025
Figure 1 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 2 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 3 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Figure 4 for Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving

Add code
Jan 08, 2025
Figure 1 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 2 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 3 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 4 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Viaarxiv icon

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

Add code
Jan 07, 2025
Figure 1 for Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Figure 2 for Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Figure 3 for Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Figure 4 for Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Figure 1 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 2 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 3 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Figure 4 for VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Viaarxiv icon

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Add code
Dec 30, 2024
Figure 1 for Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Figure 2 for Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Figure 3 for Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Figure 4 for Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
Viaarxiv icon

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Add code
Dec 27, 2024
Figure 1 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 2 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 3 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Figure 4 for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Viaarxiv icon