Picture for Sai Rajeswar

Sai Rajeswar

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

Add code
Sep 11, 2025
Viaarxiv icon

LALM-Eval: An Open-Source Toolkit for Holistic Evaluation of Large Audio Language Models

Add code
Sep 09, 2025
Viaarxiv icon

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Add code
May 27, 2025
Figure 1 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 2 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 3 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 4 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Viaarxiv icon

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

Add code
May 22, 2025
Viaarxiv icon

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Add code
Mar 27, 2025
Figure 1 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 2 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 3 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 4 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Viaarxiv icon

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

PairBench: A Systematic Framework for Selecting Reliable Judge VLMs

Add code
Feb 21, 2025
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Figure 1 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 2 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 3 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 4 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Viaarxiv icon

Representing Positional Information in Generative World Models for Object Manipulation

Add code
Sep 19, 2024
Figure 1 for Representing Positional Information in Generative World Models for Object Manipulation
Figure 2 for Representing Positional Information in Generative World Models for Object Manipulation
Figure 3 for Representing Positional Information in Generative World Models for Object Manipulation
Figure 4 for Representing Positional Information in Generative World Models for Object Manipulation
Viaarxiv icon