Abstract:We study deep state-space models (Deep SSMs) that contain linear-quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed via the $h^2$-error norms between the layerwise LQO systems, thereby providing a theoretical justification for existing model order reduction (MOR)-based compression. Building on this bound, we formulate an optimization problem in terms of the $h^2$-error norm and develop a gradient-based MOR method. On the IMDb task from the Long Range Arena benchmark, we demonstrate that our compression method achieves strong performance. Moreover, unlike prior approaches, we reduce roughly 80% of trainable parameters without retraining, with only a 4-5% performance drop.
Abstract:We propose a new system identification method Violina (various-of-trajectories identification of linear time-invariant non-Markovian dynamics). In the Violina framework, we optimize the coefficient matrices of state-space model and memory kernel in the given space using a projected gradient descent method so that its model prediction matches the set of multiple observed data. Using Violina we can identify a linear non-Markovian dynamical system with constraints corresponding to a priori knowledge on the model parameters and memory effects. Using synthetic data, we numerically demonstrate that the Markovian and non-Markovian state-space models identified by the proposed method have considerably better generalization performances compared to the models identified by an existing dynamic decomposition-based method.




Abstract:We introduce a novel learning method for Structured State Space Sequence (S4) models incorporating Diagonal State Space (DSS) layers, tailored for processing long-sequence data in edge intelligence applications, including sensor data analysis and real-time analytics. This method utilizes the balanced truncation, a prevalent model reduction technique in control theory, applied specifically to DSS layers to reduce computational costs during inference. By leveraging parameters from the reduced model, we refine the initialization process of S4 models, outperforming the widely used Skew-HiPPO initialization in terms of performance. Numerical experiments demonstrate that our trained S4 models with DSS layers surpass conventionally trained models in accuracy and efficiency metrics. Furthermore, our observations reveal a positive correlation: higher accuracy in the original model consistently leads to increased accuracy in models trained using our method, suggesting that our approach effectively leverages the strengths of the original model.