Abstract:In light of the rapid adoption of AI coding assistants, LLM-assisted development has become increasingly prevalent, creating an urgent need for robust evaluation of generated code quality. Existing benchmarks often require extensive manual effort to create static datasets, rely on indirect or insufficiently challenging tasks, depend on non-scalable ground truth, or neglect critical low-level security evaluations, particularly memory-safety issues. In this work, we introduce OSS-Bench, a benchmark generator that automatically constructs large-scale, live evaluation tasks from real-world open-source software. OSS-Bench replaces functions with LLM-generated code and evaluates them using three natural metrics: compilability, functional correctness, and memory safety, leveraging robust signals like compilation failures, test-suite violations, and sanitizer alerts as ground truth. In our evaluation, the benchmark, instantiated as OSS-Bench(php) and OSS-Bench(sql), profiles 17 diverse LLMs, revealing insights such as intra-family behavioral patterns and inconsistencies between model size and performance. Our results demonstrate that OSS-Bench mitigates overfitting by leveraging the evolving complexity of OSS and highlights LLMs' limited understanding of low-level code security via extended fuzzing experiments. Overall, OSS-Bench offers a practical and scalable framework for benchmarking the real-world coding capabilities of LLMs.
Abstract:Purpose: This work aims to raise a novel design for navigator-free multiband (MB) multishot uniform-density spiral (UDS) acquisition and reconstruction, and to demonstrate its utility for high-efficiency, high-resolution diffusion imaging. Theory and Methods: Our design focuses on the acquisition and reconstruction of navigator-free MB multishot UDS diffusion imaging. For acquisition, radiofrequency (RF) pulse encoding was employed to achieve Controlled Aliasing in Parallel Imaging (CAIPI) in MB imaging. For reconstruction, a new algorithm named slice-POCS-enhanced Inherent Correction of phase Errors (slice-POCS-ICE) was proposed to simultaneously estimate diffusion-weighted images and inter-shot phase variations for each slice. The efficacy of the proposed methods was evaluated in both numerical simulation and in vivo experiments. Results: In both numerical simulation and in vivo experiments, slice-POCS-ICE estimated phase variations more precisely and provided results with better image quality than other methods. The inter-shot phase variations and MB slice aliasing artifacts were simultaneously resolved using the proposed slice-POCS-ICE algorithm. Conclusion: The proposed navigator-free MB multishot UDS acquisition and reconstruction method is an effective solution for high-efficiency, high-resolution diffusion imaging.