Picture for Börje F. Karlsson

Börje F. Karlsson

HandwritingAgent: Language-Driven Handwriting Synthesis in Scalable Vector Space

Add code
Jun 17, 2026
Viaarxiv icon

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Add code
Jun 15, 2026
Viaarxiv icon

X-DiffVLA: X-Embodied Diffusion Action Heads for Vision-Language-Action Models

Add code
May 24, 2026
Viaarxiv icon

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

Add code
Feb 04, 2026
Viaarxiv icon

RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Contextual Adaptation

Add code
Dec 30, 2025
Viaarxiv icon

Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots

Add code
Oct 09, 2025
Figure 1 for Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
Figure 2 for Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
Figure 3 for Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
Figure 4 for Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
Viaarxiv icon

DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

Add code
Aug 07, 2025
Viaarxiv icon

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Add code
Mar 16, 2025
Viaarxiv icon

Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning

Add code
Mar 10, 2025
Figure 1 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 2 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 3 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 4 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Viaarxiv icon

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Add code
Mar 10, 2025
Figure 1 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 2 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 3 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 4 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Viaarxiv icon