Abstract:Modern Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks by employing search-augmented reasoning to incorporate external knowledge into long chains of thought. However, we identify a critical yet underexplored bottleneck in this paradigm, termed Knowledge Integration Decay (KID). Specifically, we observe that as the length of reasoning generated before search grows, models increasingly fail to integrate retrieved evidence into subsequent reasoning steps, limiting performance even when relevant information is available. To address this, we propose Self-Anchored Knowledge Encoding (SAKE), a training-free inference-time strategy designed to stabilize knowledge utilization. By anchoring retrieved knowledge at both the beginning and end of the reasoning process, SAKE prevents it from being overshadowed by prior context, thereby preserving its semantic integrity. Extensive experiments on multi-hop QA and complex reasoning benchmarks demonstrate that SAKE significantly mitigates KID and improves performance, offering a lightweight yet effective solution for knowledge integration in agentic LLMs.
Abstract:Ising machines have emerged as accelerators for combinatorial optimization. To enable practical deployment, this work aims to reduce time-to-solution by addressing three challenges: (1) hardware topology, (2) spin selection and update algorithms, and (3) scalable coupling-coefficient precision. Restricted topologies require minor embedding; naive parallel updates can oscillate or stall; and limited precision can preclude feasible mappings or degrade solution quality. This work presents Snowball, a digital, scalable, all-to-all coupled Ising machine that integrates dual-mode Markov chain Monte Carlo spin selection with asynchronous spin updates to promote convergence and reduce time-to-solution. The digital architecture supports wide, configurable coupling precision, unlike many analog realizations at high bit widths. A prototype on an AMD Alveo U250 accelerator card achieves an 8$\times$ reduction in time-to-solution relative to a state-of-the-art Ising machine on the same benchmark instance.