Picture for Kaheer Suleman

Kaheer Suleman

Skyfall AI

World of Workflows: a Benchmark for Bringing World Models to Enterprise Systems

Add code
Jan 29, 2026
Viaarxiv icon

CASSANDRA: Programmatic and Probabilistic Learning and Inference for Stochastic World Modeling

Add code
Jan 26, 2026
Viaarxiv icon

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

Add code
Dec 10, 2025
Viaarxiv icon

Investigating Failures to Generalize for Coreference Resolution Models

Add code
Mar 16, 2023
Figure 1 for Investigating Failures to Generalize for Coreference Resolution Models
Figure 2 for Investigating Failures to Generalize for Coreference Resolution Models
Figure 3 for Investigating Failures to Generalize for Coreference Resolution Models
Figure 4 for Investigating Failures to Generalize for Coreference Resolution Models
Viaarxiv icon

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems

Add code
Dec 15, 2022
Viaarxiv icon

Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

Add code
May 13, 2022
Figure 1 for Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Figure 2 for Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Figure 3 for Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Figure 4 for Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Viaarxiv icon

TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching

Add code
Oct 02, 2021
Figure 1 for TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching
Figure 2 for TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching
Figure 3 for TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching
Figure 4 for TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching
Viaarxiv icon

Modeling Event Plausibility with Consistent Conceptual Abstraction

Add code
Apr 20, 2021
Figure 1 for Modeling Event Plausibility with Consistent Conceptual Abstraction
Figure 2 for Modeling Event Plausibility with Consistent Conceptual Abstraction
Figure 3 for Modeling Event Plausibility with Consistent Conceptual Abstraction
Figure 4 for Modeling Event Plausibility with Consistent Conceptual Abstraction
Viaarxiv icon

An Analysis of Dataset Overlap on Winograd-Style Tasks

Add code
Nov 09, 2020
Figure 1 for An Analysis of Dataset Overlap on Winograd-Style Tasks
Figure 2 for An Analysis of Dataset Overlap on Winograd-Style Tasks
Figure 3 for An Analysis of Dataset Overlap on Winograd-Style Tasks
Figure 4 for An Analysis of Dataset Overlap on Winograd-Style Tasks
Viaarxiv icon

Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

Add code
Nov 13, 2019
Figure 1 for Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Figure 2 for Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Figure 3 for Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Figure 4 for Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Viaarxiv icon