Picture for Khalid Saifullah

Khalid Saifullah

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Add code
Jun 27, 2024
Viaarxiv icon

CinePile: A Long Video Question Answering Dataset and Benchmark

Add code
May 14, 2024
Viaarxiv icon

Coercing LLMs to do and reveal anything

Add code
Feb 21, 2024
Viaarxiv icon

On the Reliability of Watermarks for Large Language Models

Add code
Jun 30, 2023
Figure 1 for On the Reliability of Watermarks for Large Language Models
Figure 2 for On the Reliability of Watermarks for Large Language Models
Figure 3 for On the Reliability of Watermarks for Large Language Models
Figure 4 for On the Reliability of Watermarks for Large Language Models
Viaarxiv icon

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Add code
Jun 29, 2023
Figure 1 for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Figure 2 for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Figure 3 for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Figure 4 for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Viaarxiv icon

Seeing in Words: Learning to Classify through Language Bottlenecks

Add code
Jun 29, 2023
Figure 1 for Seeing in Words: Learning to Classify through Language Bottlenecks
Figure 2 for Seeing in Words: Learning to Classify through Language Bottlenecks
Viaarxiv icon

Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation

Add code
May 24, 2023
Figure 1 for Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation
Figure 2 for Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation
Figure 3 for Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation
Figure 4 for Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation
Viaarxiv icon