Abstract:Post-training quantization is essential for deploying Large Language Models (LLMs) on resource-constrained devices. However, standard integer quantization (e.g., INT4) fundamentally degrades performance by imposing a uniform grid on the heavy-tailed distribution of weight parameters, particularly in smaller-scale models (e.g., <2B parameters). We introduce HAS-VQ (Hessian-Adaptive Sparse Vector Quantization), a compression framework that strictly decouples high-sensitivity outliers from the bulk weight distribution using second-order sensitivity analysis. HAS-VQ employs a Hessian-Masked Decoupling strategy to isolate sensitive parameters, followed by robust Vector Quantization (VQ) of the remaining dense body. Crucially, we introduce a residual sparse feedback mechanism that corrects quantization errors in the most sensitive dimensions, ensuring exact reconstruction of outliers. We evaluate HAS-VQ on SmolLM2-1.7B, demonstrating two distinct regimes of superiority: (1) Pareto Dominance over Integer Baselines: At 4.23 effective bits-per-parameter (BPP), we achieve a perplexity of 14.23, significantly outperforming the standard INT4 baseline (20.03 PPL at 4.71 BPP). (2) High-Fidelity Compression: Relative to the FP16 baseline, HAS-VQ achieves a 2.3x reduction in model size (7.03 BPP) while maintaining statistically indistinguishable perplexity (10.12 vs. 10.04), effectively offering a lossless compression alternative for bandwidth-constrained environments. The code is available at https://github.com/VladimerKhasia/HASVQ
Abstract:Scaling sequence modeling to extreme contexts requires balancing computational efficiency with representational expressivity. While Transformers provide precise retrieval via the attention mechanism, their quadratic $\mathcal{O}(T^2)$ complexity limits their application to long-horizon tasks. In this work, we propose the \textbf{Spectral-Window Hybrid (SWH)}, an architecture that decouples sequence modeling into two \textit{parallel} streams: a global branch utilizing the Convolution Theorem to model long-range decay dynamics in $\mathcal{O}(T \log T)$ time, and a local branch employing sliding-window attention for token interactions within a bounded context. By aggregating these representations, SWH avoids the computational bottleneck of global attention while retaining local precision. We demonstrate that SWH matches the perplexity of standard Transformers on short contexts while enabling efficient linear scaling to extended sequences. The code is available at https://github.com/VladimerKhasia/SWH
Abstract:Mixture of Experts (MoE) models scale capacity but often suffer from representation collapse and gradient instability. We propose Dynamic Subspace Composition (DSC), a framework that approximates context-dependent weights via a state-dependent, sparse expansion of a shared basis bank. Formally, DSC models the weight update as a residual trajectory within a Star- Shaped Domain, employing a Magnitude-Gated Simplex Interpolation to ensure continuity at the identity. Unlike standard Mixture-of-LoRAs, which incurs O(M rd) parameter complexity by retrieving independent rank-r matrices, DSC constructs a compositional rank-K approximation from decoupled unit-norm basis vectors. This reduces parameter complexity to O(M d) and memory traffic to O(Kd), while Frame-Theoretic regularization and spectral constraints provide rigorous worst-case bounds on the dynamic update. The code is available at https://github. com/VladimerKhasia/DSC
Abstract:We present DeepVekua, a hybrid architecture that unifies geometric deep learning with spectral analysis to solve partial differential equations (PDEs) in sparse data regimes. By learning a diffeomorphic coordinate transformation that maps complex geometries to a latent harmonic space, our method outperforms state-of-the-art implicit representations on advection-diffusion systems. Unlike standard coordinate-based networks which struggle with spectral bias, DeepVekua separates the learning of geometry from the learning of physics, solving for optimal spectral weights in closed form. We demonstrate a 100x improvement over spectral baselines. The code is available at https://github.com/VladimerKhasia/vekuanet.
Abstract:Coordinate-based neural networks have emerged as a powerful tool for representing continuous physical fields, yet they face two fundamental pathologies: spectral bias, which hinders the learning of high-frequency dynamics, and the curse of dimensionality, which causes parameter explosion in discrete feature grids. We propose the Adaptive Vekua Cascade (AVC), a hybrid architecture that bridges deep learning and classical approximation theory. AVC decouples manifold learning from function approximation by using a deep network to learn a diffeomorphic warping of the physical domain, projecting complex spatiotemporal dynamics onto a latent manifold where the solution is represented by a basis of generalized analytic functions. Crucially, we replace the standard gradient-descent output layer with a differentiable linear solver, allowing the network to optimally resolve spectral coefficients in a closed form during the forward pass. We evaluate AVC on a suite of five rigorous physics benchmarks, including high-frequency Helmholtz wave propagation, sparse medical reconstruction, and unsteady 3D Navier-Stokes turbulence. Our results demonstrate that AVC achieves state-of-the-art accuracy while reducing parameter counts by orders of magnitude (e.g., 840 parameters vs. 4.2 million for 3D grids) and converging 2-3x faster than implicit neural representations. This work establishes a new paradigm for memory-efficient, spectrally accurate scientific machine learning. The code is available at https://github.com/VladimerKhasia/vecua.
Abstract:Implicit Neural Representations (INRs) have emerged as a powerful paradigm for parameterizing physical fields, yet they often suffer from spectral bias and the computational expense of non-convex optimization. We introduce the Vekua Layer (VL), a differentiable spectral method grounded in the classical theory of Generalized Analytic Functions. By restricting the hypothesis space to the kernel of the governing differential operator -- specifically utilizing Harmonic and Fourier-Bessel bases -- the VL transforms the learning task from iterative gradient descent to a strictly convex least-squares problem solved via linear projection. We evaluate the VL against Sinusoidal Representation Networks (SIRENs) on homogeneous elliptic Partial Differential Equations (PDEs). Our results demonstrate that the VL achieves machine precision ($\text{MSE} \approx 10^{-33}$) on exact reconstruction tasks and exhibits superior stability in the presence of incoherent sensor noise ($\text{MSE} \approx 0.03$), effectively acting as a physics-informed spectral filter. Furthermore, we show that the VL enables "holographic" extrapolation of global fields from partial boundary data via analytic continuation, a capability absent in standard coordinate-based approximations.