Abstract:As large language models span dense, mixture-of-experts, and state-space architectures and are deployed on heterogeneous accelerators under increasingly diverse multimodal workloads, optimising inference energy has become as critical as optimizing latency and throughput. Existing approaches either treat latency as an energy proxy or rely on data-hungry black-box surrogates. Both fail under varying parallelism strategies: latency and energy optima diverge in over 20% of configurations we tested, and black-box surrogates require hundreds of profiling samples to generalize across model families and hardware. We present EnergyLens, which uses symbolic regression as a structure-discovery tool over profiling data to derive a single twelve-parameter closed-form energy model expressed in terms of system properties such as degree of parallelism, batch size, and sequence length. Unlike black-box surrogates, EnergyLens decouples tensor and pipeline parallelism contributions and separates prefill from decode energy, making its predictions physically interpretable and actionable. Fitted from as few as 50 profiling measurements, EnergyLens achieves 88.2% Top-1 configuration selection accuracy across many evaluation scenarios compared to 60.9% for the closest prior analytical baseline, matches the predictive accuracy of ensemble ML methods with 10x fewer profiling samples, and extrapolates reliably to unseen batch sizes and hardware platforms without structural modification, making it a practical, interpretable tool for energy-optimal LLM deployment.
Abstract:Ghost imaging reconstructs spatial information from a single-pixel bucket detector by correlating structured illumination patterns with scalar intensity measurements. While deep learning approaches have achieved promising results on static scenes, two critical limitations remain unaddressed: existing architectures fail to exploit temporal coherence across frames, leaving dynamic ghost imaging largely unsolved, and they assume additive Gaussian noise models that do not reflect the true Poissonian statistics of real single-photon hardware. We present DynGhost (Dynamic Ghost Imaging Transformer), a transformer architecture that addresses both limitations through alternating spatial and temporal attention blocks. Our quantum-aware training framework, based on physically accurate detector simulations (SNSPDs, SPADs, SiPMs) and Anscombe variance-stabilizing normalization, resolves the distribution shift that causes classical models to fail under realistic hardware constraints. Experiments across multiple benchmarks demonstrate that DynGhost outperforms both traditional reconstruction methods and existing deep learning architectures, with particular gains in dynamic and photon-starved settings.
Abstract:Convolutional neural networks rely on linear filtering operations that can be reformulated efficiently in suitable transform domains. At the same time, advances in quantum computing have shown that certain structured linear transforms can be implemented with shallow quantum circuits, opening the door to hybrid quantum-classical approaches for enhancing deep learning models. In this work, we introduce WTHaar-Net, a convolutional neural network that replaces the Hadamard Transform used in prior hybrid architectures with the Haar Wavelet Transform (HWT). Unlike the Hadamard Transform, the Haar transform provides spatially localized, multi-resolution representations that align more closely with the inductive biases of vision tasks. We show that the HWT admits a quantum realization using structured Hadamard gates, enabling its decomposition into unitary operations suitable for quantum circuits. Experiments on CIFAR-10 and Tiny-ImageNet demonstrate that WTHaar-Net achieves substantial parameter reduction while maintaining competitive accuracy. On Tiny-ImageNet, our approach outperforms both ResNet and Hadamard-based baselines. We validate the quantum implementation on IBM Quantum cloud hardware, demonstrating compatibility with near-term quantum devices.