Picture for Kartik Mathur

Kartik Mathur

AI Benchmark Democratization and Carpentry

Add code
Dec 12, 2025
Viaarxiv icon

GUI-360$^\circ$: A Comprehensive Dataset and Benchmark for Computer-Using Agents

Add code
Nov 10, 2025
Viaarxiv icon

GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

Add code
Nov 06, 2025
Viaarxiv icon

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Add code
Apr 22, 2024
Figure 1 for RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Figure 2 for RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Figure 3 for RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Figure 4 for RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Viaarxiv icon