Abstract:Large language model (LLM) agents deployed for multi-step tasks frequently fail in predictable ways: attempting actions with unmet preconditions, issuing redundant commands, or mishandling environment constraints. While retrieval-augmented generation (RAG) can improve performance by providing runtime guidance, it requires maintaining external knowledge databases and adds computational overhead at every deployment. We propose a simple pipeline that converts inference-time retrieval into learned competence through distillation. Our approach: (1) extracts compact, reusable hints from agent failures, (2) uses these hints to generate improved teacher trajectories via one-shot retrieval at episode start, and (3) trains student models on these trajectories with hint strings removed, forcing internalization rather than memorization. Across two interactive benchmarks, ALFWorld (household tasks) and WebShop (online shopping), distilled students consistently outperform baseline agents, achieving up to 91% success on ALFWorld (vs. 79% for baselines) and improving WebShop scores to 72 (vs. 61 for baselines), while using 10-60% fewer tokens than retrieval-augmented teachers depending on the environment. The approach generalizes across model scales (7B/14B parameters) and agent architectures (ReAct/StateAct), demonstrating that retrieval benefits can be effectively internalized through targeted fine-tuning without permanent runtime dependencies.




Abstract:Predicting knee joint angles accurately is critical for biomechanical analysis and rehabilitation. This paper introduces a new deep learning model called FocalGatedNet that incorporates Dynamic Contextual Focus (DCF) Attention and Gated Linear Units (GLU) to enhance feature dependencies and interactions. Our proposed model is evaluated on a large-scale dataset and compared to existing models such as Transformer, Autoformer, Informer, NLinear, DLinear, and LSTM in multi-step gait trajectory prediction. Our results demonstrate that FocalGatedNet outperforms other state-of-the-art models for long-term prediction lengths (60 ms, 80 ms, and 100 ms), achieving an average improvement of 13.66% in MAE and 8.13% in RMSE compared to the second-best performing model (Transformer). Furthermore, our model has a lower computational load than most equivalent deep learning models. These results highlight the effectiveness of our proposed model for knee joint angle prediction and the importance of our modifications for this specific application.