Picture for Anthony G. Cohn

Anthony G. Cohn

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

Add code
May 18, 2026
Viaarxiv icon

Can Large Language Models Generalize Procedures Across Representations?

Add code
Feb 03, 2026
Viaarxiv icon

BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks

Add code
May 20, 2025
Figure 1 for BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
Figure 2 for BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
Figure 3 for BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
Figure 4 for BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks
Viaarxiv icon

Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores

Add code
Oct 04, 2024
Figure 1 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 2 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 3 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Figure 4 for Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores
Viaarxiv icon

Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction

Add code
Jun 20, 2024
Figure 1 for Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
Figure 2 for Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
Figure 3 for Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
Figure 4 for Exploring Spatial Representations in the Historical Lake District Texts with LLM-based Relation Extraction
Viaarxiv icon

Dishonesty in Helpful and Harmless Alignment

Add code
Jun 04, 2024
Figure 1 for Dishonesty in Helpful and Harmless Alignment
Figure 2 for Dishonesty in Helpful and Harmless Alignment
Figure 3 for Dishonesty in Helpful and Harmless Alignment
Figure 4 for Dishonesty in Helpful and Harmless Alignment
Viaarxiv icon

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

Add code
May 23, 2024
Figure 1 for Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Figure 2 for Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Figure 3 for Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Figure 4 for Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning
Viaarxiv icon

Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark

Add code
Jan 08, 2024
Figure 1 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 2 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 3 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Figure 4 for Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Viaarxiv icon

The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges

Add code
Sep 28, 2023
Figure 1 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 2 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 3 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Figure 4 for The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges
Viaarxiv icon

Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings

Add code
Mar 30, 2023
Figure 1 for Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings
Figure 2 for Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings
Figure 3 for Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings
Figure 4 for Object-agnostic Affordance Categorization via Unsupervised Learning of Graph Embeddings
Viaarxiv icon