Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Timothy Tin-Long Tse

Understanding In-Context Learning Beyond Transformers: An Investigation of State Space and Hybrid Architectures

Oct 27, 2025

Shenran Wang, Timothy Tin-Long Tse, Jian Zhu

Abstract:We perform in-depth evaluations of in-context learning (ICL) on state-of-the-art transformer, state-space, and hybrid large language models over two categories of knowledge-based ICL tasks. Using a combination of behavioral probing and intervention-based methods, we have discovered that, while LLMs of different architectures can behave similarly in task performance, their internals could remain different. We discover that function vectors (FVs) responsible for ICL are primarily located in the self-attention and Mamba layers, and speculate that Mamba2 uses a different mechanism from FVs to perform ICL. FVs are more important for ICL involving parametric knowledge retrieval, but not for contextual knowledge understanding. Our work contributes to a more nuanced understanding across architectures and task types. Methodologically, our approach also highlights the importance of combining both behavioural and mechanistic analyses to investigate LLM capabilities.

Via

Access Paper or Ask Questions

Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM

Sep 25, 2024

Robin Shing-Hei Yuen, Timothy Tin-Long Tse, Jian Zhu

Abstract:Current speech-based LLMs are predominantly trained on extensive ASR and TTS datasets, excelling in tasks related to these domains. However, their ability to handle direct speech-to-speech conversations remains notably constrained. These models often rely on an ASR-to-TTS chain-of-thought pipeline, converting speech into text for processing before generating audio responses, which introduces latency and loses audio features. We propose a method that implicitly internalizes ASR chain of thought into a speech LLM, enhancing its native speech understanding capabilities. Our approach reduces latency and improves the model's native understanding of speech, paving the way for more efficient and natural real-time audio interactions. We also release a large-scale synthetic conversational dataset to facilitate further research.

Via

Access Paper or Ask Questions