Picture for Jinxing Zhou

Jinxing Zhou

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

Add code
Dec 18, 2025
Viaarxiv icon

User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation

Add code
Dec 11, 2025
Viaarxiv icon

A Closer Look at Knowledge Distillation in Spiking Neural Network Training

Add code
Nov 14, 2025
Viaarxiv icon

CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization

Add code
Aug 06, 2025
Viaarxiv icon

Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

Add code
Jul 02, 2025
Viaarxiv icon

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

Add code
Dec 21, 2024
Figure 1 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 2 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 3 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Figure 4 for MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Viaarxiv icon

Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration

Add code
Dec 17, 2024
Figure 1 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 2 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 3 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Figure 4 for Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity Collaboration
Viaarxiv icon

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing

Add code
Dec 17, 2024
Figure 1 for Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Figure 2 for Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Figure 3 for Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Figure 4 for Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Viaarxiv icon

Patch-level Sounding Object Tracking for Audio-Visual Question Answering

Add code
Dec 14, 2024
Figure 1 for Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Figure 2 for Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Figure 3 for Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Figure 4 for Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Viaarxiv icon

Towards Open-Vocabulary Audio-Visual Event Localization

Add code
Nov 18, 2024
Viaarxiv icon