Picture for Raquel Fernández

Raquel Fernández

Institute for Logic, Language & Computation, University of Amsterdam

Playpen: An Environment for Exploring Learning Through Conversational Interaction

Add code
Apr 11, 2025
Viaarxiv icon

Experiential Semantic Information and Brain Alignment: Are Multimodal Models Better than Language Models?

Add code
Apr 01, 2025
Viaarxiv icon

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Add code
Apr 01, 2025
Viaarxiv icon

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests

Add code
Feb 20, 2025
Viaarxiv icon

Natural Language Generation from Visual Sequences: Challenges and Future Directions

Add code
Feb 18, 2025
Viaarxiv icon

RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs

Add code
Dec 18, 2024
Figure 1 for RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Figure 2 for RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Figure 3 for RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Figure 4 for RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Viaarxiv icon

Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation

Add code
Dec 18, 2024
Viaarxiv icon

Modelling Multimodal Integration in Human Concept Processing with Vision-and-Language Models

Add code
Jul 25, 2024
Viaarxiv icon

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

Add code
Jul 05, 2024
Viaarxiv icon

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Add code
Jun 26, 2024
Figure 1 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 2 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 3 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Figure 4 for LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Viaarxiv icon