Network pruning is a popular approach to reduce a heavy network to obtain a lightweight form by removing redundancy in the heavy network. In this approach, a complex over-parameterized network is first trained, then pruned based on some criteria, and finally fine-tuned to achieve comparable performance with reduced parameters.
Maritime object detection is critical for the safe navigation of unmanned surface vessels (USVs), requiring accurate recognition of obstacles from small buoys to large vessels. Real-time detection is challenging due to long distances, small object sizes, large-scale variations, edge computing limitations, and the high memory demands of high-resolution imagery. Existing solutions, such as downsampling or image splitting, often reduce accuracy or require additional processing, while memory-efficient models typically handle only limited resolutions. To overcome these limitations, we leverage Vision Mamba (ViM) backbones, which build on State Space Models (SSMs) to capture long-range dependencies while scaling linearly with sequence length. Images are tokenized into sequences for efficient high-resolution processing. For further computational efficiency, we design a tailored Feature Pyramid Network with successive downsampling and SSM layers, as well as token pruning to reduce unnecessary computation on background regions. Compared to state-of-the-art methods like RT-DETR with ResNet50 backbone, our approach achieves a better balance between performance and computational efficiency in maritime object detection.
Efficient LLM inference research has largely focused on reducing the cost of each decoding step (e.g., using quantization, pruning, or sparse attention), typically applying a uniform computation budget to every generated token. In practice, token difficulty varies widely, so static compression can over-compute on easy steps and under-compute on hard ones. We study dynamic budget allocation for autoregressive decoding: learning how much computation to spend per token from within a single model. Self-Optimizing Language Models (SOL) pair a frozen LLM with a lightweight policy network that reads the LLM hidden state and selects a discrete efficiency action at each decode step. Actions can jointly control (i) token-level attention sparsity, (ii) structured activation pruning in the MLP, and (iii) activation quantization bit-width, while leaving the base model weights unchanged. We train the policy with group-relative policy optimization on teacher-forced episodes: the token sequence is fixed, while we sample multiple compute schedules (i.e., "counterfactual" schedules that vary only the efficiency actions for the same token path) and compare their likelihoods under the same supervision. Our reward trades off language-model quality against soft penalties that encourage episode-average budget usage to match a requested target. Across model variants and compute regimes, SOL improves quality at matched budget over static allocation and strong random schedule search, offering a complementary axis for inference-efficiency optimization. SOL discovers a better quality-efficiency pareto-front across all our experiments and improves MMLU accuracy by up to 7.3% over uniform budget allocation strategies.
We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.
Graph Neural Networks (GNNs) excel at relational reasoning but face two persistent challenges: the lack of interpretable attribution for heterogeneous node types, and the computational overhead of message passing over large, noisy graphs. We propose the Hierarchical Attention-based Heterogeneous GNN (HA-HeteroGNN), a framework that addresses both issues through a unied explainability-to-pruning pipeline. A two-tier attention mechanism separates sensor-level and context-level computation across 16 node types and 18 edge types, producing per-node relevance scores via an attention-based GNN Explainer without requiring gradient backpropagation. These relevance scores then serve as a principled pruning criterion: removing nodes identied as consistently uninformative yields a 27% reduction in graph edges while simultaneously improving classication accuracy by 2.46.1% across all model variants, challenging the conventional assumption that pruning necessarily trades accuracy for eciency. Experiments on a 50,000-record synthetic dataset spanning 11 report categories demonstrate 97.5% cross-strategy explanation stability and domain consistent sensor attribution, with training-time reductions of up to 43.9% and real-time inference latency of approximately 5860 ms per sample.
Robots operate under significant uncertainty, from quantifiable noise to unquantifiable unknowns, and must account for strict operational constraints, such as limited resources. In this paper, we consider the problem of synthesizing robust strategies to guide a robot's actions in fulfilling a given task, while ensuring the system never exhausts its resources. To solve this problem, we first model the robotic system as a Consumption Markov Decision Process with Set-valued Transitions(CMDPST), a unified framework modelling nondeterministic actions, quantifiable and unquantifiable uncertainty, and resource consumption. Then, we combine the CMDPST with the task specification, expressed as a Linear Temporal Logic over finite traces (LTLf ) formula. Lastly, we address the resource constrained optimal robust strategy synthesis problem, which aims to synthesize a strategy that maximizes the probability of satisfying the LTLf objective without resource exhaustion. Our solution involves two techniques: a direct unrolling-based method and a more efficient, optimized approach that leverages state-space pruning for better performance. Experiments on a warehouse transportation network show the effectiveness of the proposed solutions.
3D Gaussian splatting (3DGS) is a state-of-the-art representation for real-time photorealistic novel-view synthesis, yet a single high-fidelity scene typically occupies hundreds of megabytes to several gigabytes, exceeding the budgets of mobile, immersive, and volumetric video platforms. Existing 3DGS compression methods (e.g., HAC++, FlexGaussian, LP-3DGS) treat pruning, quantization, and entropy coding as separate stages and rely on hand-tuned heuristics (opacity thresholds, fixed bit-widths, SH truncation), limiting cross-scene generalization and preventing users from specifying a target rate or quality budget. We propose GETA-3DGS, to our knowledge the first end-to-end automatic joint structured pruning and quantization framework for 3DGS. Building on GETA for joint pruning-quantization of deep networks, we contribute: (i) a 3DGS-aware quantization-aware dependency graph (QADG) treating each Gaussian primitive as a group with five attribute sub-nodes and degree-aware SH sub-nodes; (ii) a render-aware saliency fusing transmittance-weighted contribution, screen-space gradient, and pixel coverage into a Gaussian-level importance score; and (iii) a heterogeneous per-attribute mixed-precision scheme co-optimized with structural sparsity under a projected partial saliency-guided (PPSG) descent guarantee. On Mip-NeRF 360, Tanks and Temples, and Deep Blending, GETA-3DGS operates directly on raw Gaussian primitives rather than a post-hoc anchor representation, delivering ~5x storage reduction over Vanilla 3DGS with no per-scene thresholds. Bit-width policy is the dominant rate-distortion lever: a uniform 6-bit cap costs up to -6.74 dB on view-dependent scenes versus our heterogeneous allocation, matching an information-theoretic reverse-water-filling analysis we develop. GETA-3DGS is complementary to existing codecs: entropy coding (HAC++, CompGS) is downstream, so the two can be composed.
This paper addresses the Dynamic UGV-UAV Cooperative Path Planning (DUCPP) problem involving one unmanned ground vehicle (UGV) assisted by one or more unmanned aerial vehicles (UAVs) operating on an uncertain road network with potentially impassable edges. DUCPP is particularly relevant for scenarios such as disaster response, emergency supply transport, and rescue operations, where a UGV must reach a specified destination in the presence of partially unknown road conditions. To enable the UGV to travel safely and efficiently to its destination, the UAV(s) dynamically inspect edges in the environment to identify and prune damaged or impassable edges from consideration. We present multiple strategies, including a bidirectional approach, to optimize UGV-UAV cooperation for finding a safe path in an uncertain road network. Furthermore, we explore the impact of using multiple UAVs on reducing the UGV's travel time, and evaluate the associated computation time. The proposed strategies are implemented and evaluated on 100 urban road networks. The results demonstrate that the bidirectional strategy achieves the best performance in most instances, and using multiple UAVs further reduces UGV travel time at the expense of increased computation time. This paper presents a robust framework for DUCPP to achieve efficient UGV-UAV cooperation for path planning and inspection, offering practical solutions for navigation in challenging and uncertain conditions.
Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work has focused on importance criteria and search algorithms, often treating layer redundancy as an inherent structural property of pretrained networks. In contrast, we adopt a \emph{functional perspective}, where redundancy is jointly influenced by the model and the evaluation objective, suggesting that a universal ranking may not be sufficient. Through an empirical study across three LLM families, two calibration objectives, and seven search algorithms, we observe that different objectives yield qualitatively different redundant layers, and that perplexity and downstream accuracy rankings do not consistently align. Under a fixed objective, however, search algorithms tend to produce similar solutions. Overall, our results suggest that the calibration objective may play a more influential role than the choice of search algorithm, indicating that further attention to objective design could be beneficial.
High-capacity associative memories based on Kernel Logistic Regression (KLR) are known for their exceptional performance but are hindered by high computational costs. This paper investigates the compressibility of KLR-trained Hopfield networks to understand the geometric principles of its robust encoding. We provide a comprehensive geometric theory based on spontaneous symmetry breaking and Walsh analysis, and validate it with compression experiments (quantization and pruning). Our experiments reveal a striking contrast: the network is extremely robust to low-precision quantization but highly sensitive to pruning. Our theory explains this via a ``sparse function, dense representation'' principle, where a sparse input mapping is implemented with a dense, bimodal parameterization. Our findings not only provide a practical path to hardware-efficient kernel memories but also offer new insights into the geometric principles of robust representation in neural systems.
We study the organization of channel-level importance in transformer feed-forward networks (FFNs). Using a Fisher-style loss proxy (LP) based on activation-gradient second moments, we show that loss sensitivity is concentrated in a small set of channels within each layer. In Llama-3.1-8B, the top 1% of channels per layer accounts for a median of 58.7% of LP mass, with a range of 33.0% to 86.1%. We call these loss-critical channels supernodes. Although FFN layers also contain strong activation outliers, LP-defined supernodes overlap only weakly with activation-defined outliers and are not explained by activation power or weight norms alone. Around this core, we find a weaker but consistent halo structure: some non-supernode channels share the supernodes' write support and show stronger redundancy with the protected core. We use one-shot structured FFN pruning as a diagnostic test of this organization. At 50% FFN sparsity, baselines that prune many supernodes degrade sharply, whereas our SCAR variants explicitly protect the supernode core; the strongest variant, SCAR-Prot, reaches perplexity 54.8 compared with 989.2 for Wanda-channel. The LP-concentration pattern appears across Mistral-7B, Llama-2-7B, and Qwen2-7B, remains visible in targeted Llama-3.1-70B experiments, and increases during OLMo-2-7B pretraining. These results suggest that LLM FFNs develop a small learned core of loss-critical channels, and that preserving this core is important for reliable structured pruning.