Picture for Perouz Taslakian

Perouz Taslakian

Grounding Computer Use Agents on Human Demonstrations

Add code
Nov 10, 2025
Figure 1 for Grounding Computer Use Agents on Human Demonstrations
Figure 2 for Grounding Computer Use Agents on Human Demonstrations
Figure 3 for Grounding Computer Use Agents on Human Demonstrations
Figure 4 for Grounding Computer Use Agents on Human Demonstrations
Viaarxiv icon

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Add code
May 27, 2025
Figure 1 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 2 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 3 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Figure 4 for Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Viaarxiv icon

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Add code
Mar 27, 2025
Figure 1 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 2 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 3 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Figure 4 for StarFlow: Generating Structured Workflow Outputs From Sketch Images
Viaarxiv icon

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Add code
Mar 19, 2025
Viaarxiv icon

Learning to Defer for Causal Discovery with Imperfect Experts

Add code
Feb 18, 2025
Figure 1 for Learning to Defer for Causal Discovery with Imperfect Experts
Figure 2 for Learning to Defer for Causal Discovery with Imperfect Experts
Figure 3 for Learning to Defer for Causal Discovery with Imperfect Experts
Figure 4 for Learning to Defer for Causal Discovery with Imperfect Experts
Viaarxiv icon

ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval

Add code
Feb 11, 2025
Viaarxiv icon

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Add code
Feb 03, 2025
Figure 1 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 2 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 3 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Figure 4 for AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Viaarxiv icon

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Add code
Dec 05, 2024
Figure 1 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 2 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 3 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Figure 4 for BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Viaarxiv icon

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Add code
Jul 08, 2024
Figure 1 for InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Figure 2 for InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Figure 3 for InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Figure 4 for InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Viaarxiv icon

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Add code
Jun 17, 2024
Viaarxiv icon