Abstract:Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.




Abstract:In wearable sensing applications, data is inevitable to be irregularly sampled or partially missing, which pose challenges for any downstream application. An unique aspect of wearable data is that it is time-series data and each channel can be correlated to another one, such as x, y, z axis of accelerometer. We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors. We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis and then feeding the data into a LSTM-based denoising autoencoder which can reconstruct missingness along the time axis. We experiment the model on the extreme missingness scenario ($>50\%$ missing rate) which has not been widely tested in wearable data. Our experiments on activity recognition show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.