Picture for Xu Zhou

Xu Zhou

Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing, China

Step-GUI Technical Report

Add code
Dec 19, 2025
Viaarxiv icon

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Add code
Dec 19, 2025
Viaarxiv icon

Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration

Add code
Nov 08, 2025
Viaarxiv icon

AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation

Add code
Aug 21, 2025
Viaarxiv icon

DeepGo: Predictive Directed Greybox Fuzzing

Add code
Jul 29, 2025
Viaarxiv icon

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Add code
Jun 06, 2025
Figure 1 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 2 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 3 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Figure 4 for Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Viaarxiv icon

InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Add code
May 21, 2025
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Figure 1 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 2 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 3 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 4 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Viaarxiv icon

DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering

Add code
Oct 17, 2024
Figure 1 for DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
Figure 2 for DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
Figure 3 for DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
Figure 4 for DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering
Viaarxiv icon

Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look

Add code
Oct 16, 2024
Figure 1 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 2 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 3 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Figure 4 for Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
Viaarxiv icon