Picture for Xiangru Jian

Xiangru Jian

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Add code
May 27, 2025
Viaarxiv icon

LazyVLM: Neuro-Symbolic Approach to Video Analytics

Add code
May 27, 2025
Viaarxiv icon

GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

Add code
Apr 17, 2025
Viaarxiv icon

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Figure 1 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 2 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 3 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 4 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Viaarxiv icon

Enhancing Graph Self-Supervised Learning with Graph Interplay

Add code
Oct 08, 2024
Viaarxiv icon

Do spectral cues matter in contrast-based graph self-supervised learning?

Add code
May 30, 2024
Figure 1 for Do spectral cues matter in contrast-based graph self-supervised learning?
Figure 2 for Do spectral cues matter in contrast-based graph self-supervised learning?
Figure 3 for Do spectral cues matter in contrast-based graph self-supervised learning?
Figure 4 for Do spectral cues matter in contrast-based graph self-supervised learning?
Viaarxiv icon

HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models

Add code
Apr 07, 2024
Figure 1 for HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models
Figure 2 for HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models
Figure 3 for HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models
Figure 4 for HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models
Viaarxiv icon

InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution

Add code
Oct 25, 2023
Viaarxiv icon