Picture for Daniel Khashabi

Daniel Khashabi

Shammie

Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification

Add code
Jul 04, 2024
Viaarxiv icon

LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression

Add code
Jun 28, 2024
Viaarxiv icon

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

Add code
Jun 20, 2024
Figure 1 for Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Figure 2 for Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Figure 3 for Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Figure 4 for Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Viaarxiv icon

DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation

Add code
May 22, 2024
Viaarxiv icon

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Add code
Apr 05, 2024
Figure 1 for Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Figure 2 for Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Figure 3 for Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Figure 4 for Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Viaarxiv icon

SELF-CORRECT: LLMs Struggle with Refining Self-Generated Responses

Add code
Apr 04, 2024
Figure 1 for SELF-CORRECT: LLMs Struggle with Refining Self-Generated Responses
Figure 2 for SELF-CORRECT: LLMs Struggle with Refining Self-Generated Responses
Figure 3 for SELF-CORRECT: LLMs Struggle with Refining Self-Generated Responses
Figure 4 for SELF-CORRECT: LLMs Struggle with Refining Self-Generated Responses
Viaarxiv icon

Tur[k]ingBench: A Challenge Benchmark for Web Agents

Add code
Mar 21, 2024
Figure 1 for Tur[k]ingBench: A Challenge Benchmark for Web Agents
Figure 2 for Tur[k]ingBench: A Challenge Benchmark for Web Agents
Figure 3 for Tur[k]ingBench: A Challenge Benchmark for Web Agents
Figure 4 for Tur[k]ingBench: A Challenge Benchmark for Web Agents
Viaarxiv icon

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

Add code
Mar 19, 2024
Figure 1 for Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Figure 2 for Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Figure 3 for Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Figure 4 for Dated Data: Tracing Knowledge Cutoffs in Large Language Models
Viaarxiv icon

RORA: Robust Free-Text Rationale Evaluation

Add code
Mar 01, 2024
Figure 1 for RORA: Robust Free-Text Rationale Evaluation
Figure 2 for RORA: Robust Free-Text Rationale Evaluation
Figure 3 for RORA: Robust Free-Text Rationale Evaluation
Figure 4 for RORA: Robust Free-Text Rationale Evaluation
Viaarxiv icon

AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies

Add code
Feb 19, 2024
Viaarxiv icon