Abstract:The evolution of Large Language Models (LLMs) into autonomous agents has expanded the scope of AI coding from localized code generation to complex, repository-level, and execution-driven problem solving. However, current benchmarks predominantly evaluate code logic in static contexts, neglecting the dynamic, full-process requirements of real-world engineering, particularly in backend development which demands rigorous environment configuration and service deployment. To address this gap, we introduce ABC-Bench, a benchmark explicitly designed to evaluate agentic backend coding within a realistic, executable workflow. Using a scalable automated pipeline, we curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories. Distinct from previous evaluations, ABC-Bench require the agents to manage the entire development lifecycle from repository exploration to instantiating containerized services and pass the external end-to-end API tests. Our extensive evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks, highlighting a substantial disparity between current model capabilities and the demands of practical backend engineering. Our code is available at https://github.com/OpenMOSS/ABC-Bench.




Abstract:Objective Maps of B0 field inhomogeneities are often used to improve MRI image quality, even in a retrospective fashion. These field inhomogeneities depend on the exact head position within the static field but acquiring field maps (FM) at every position is time consuming. Here we explore different ways to obtain B0 predictions at different head positions. Methods FM were predicted from iterative simulations with four field factors: 1) sample induced B0 field, 2) system's spherical harmonic shim field, 3) perturbing field originating outside the field of view, 4) sequence phase errors. The simulation was improved by including local susceptibility sources estimated from UTE scans and position-specific masks. The estimation performance of the simulated FMs and a transformed FM, obtained from the measured reference FM, were compared with the actual FM at different head positions. Results The transformed FM provided inconsistent results for large head movements (>5 degree rotation), while the simulation strategy had a superior prediction accuracy for all positions. The simulated FM was used to optimize B0 shims with up to 22.2% improvement with respect to the transformed FM approach. Conclusion The proposed simulation strategy is able to predict movement induced B0 field inhomogeneities yielding more precise estimates of the ground truth field homogeneity than the transformed FM.