Picture for Chantal Shaib

Chantal Shaib

Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence

Add code
Jan 17, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting

Add code
May 23, 2025
Figure 1 for Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting
Figure 2 for Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting
Figure 3 for Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting
Figure 4 for Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting
Viaarxiv icon

Who Taught You That? Tracing Teachers in Model Distillation

Add code
Feb 10, 2025
Figure 1 for Who Taught You That? Tracing Teachers in Model Distillation
Figure 2 for Who Taught You That? Tracing Teachers in Model Distillation
Figure 3 for Who Taught You That? Tracing Teachers in Model Distillation
Figure 4 for Who Taught You That? Tracing Teachers in Model Distillation
Viaarxiv icon

Detection and Measurement of Syntactic Templates in Generated Text

Add code
Jun 28, 2024
Figure 1 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 2 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 3 for Detection and Measurement of Syntactic Templates in Generated Text
Figure 4 for Detection and Measurement of Syntactic Templates in Generated Text
Viaarxiv icon

Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

Add code
Mar 01, 2024
Figure 1 for Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Figure 2 for Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Figure 3 for Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Figure 4 for Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Viaarxiv icon

How Much Annotation is Needed to Compare Summarization Models?

Add code
Feb 28, 2024
Figure 1 for How Much Annotation is Needed to Compare Summarization Models?
Figure 2 for How Much Annotation is Needed to Compare Summarization Models?
Figure 3 for How Much Annotation is Needed to Compare Summarization Models?
Figure 4 for How Much Annotation is Needed to Compare Summarization Models?
Viaarxiv icon

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Add code
Jul 09, 2023
Figure 1 for Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Figure 2 for Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Figure 3 for Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Figure 4 for Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Viaarxiv icon

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

Add code
May 11, 2023
Figure 1 for Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Figure 2 for Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Figure 3 for Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Figure 4 for Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Viaarxiv icon