Jack
Abstract:A microwave linear analog computer (MiLAC) is a tunable microwave network that performs linear operations directly on radio-frequency signals through wave propagation. Used as an antenna-array front end, it can map many antenna signals to a small number of active RF chains. While lossless reciprocal MiLACs have been shown to provide flexible or capacity-achieving beamforming for wireless communications, their sensing performance remains largely unexplored. We analyze direction-of-arrival estimation for $K$ far-field targets using a tunable receive-side lossless reciprocal MiLAC combiner. We show that the Fisher information matrix depends on the combiner only through the orthogonal projector onto its row space and never exceeds that of a fully digital receiver. Equality holds when the row space contains the $2K$-dimensional joint steering--derivative subspace, establishing a zero-gap threshold of two RF chains per target. A dimension-counting argument lower-bounds the number of tunable components required to achieve the digital Cramér--Rao bound for every target configuration. The stem-connected MiLAC attains this bound asymptotically, up to an antenna-count-independent additive overhead, while scaling linearly with the antenna and target counts. Unlike a phase-shifter front end with the same number of RF chains, MiLAC can exactly attain the fully digital bound. Numerical results validate the analysis.
Abstract:Web navigation requires agents to follow natural language goals, interact with web pages, and produce accurate answers. While recent advances leverage vision-language models and reinforcement learning, existing methods still suffer from single-step fragility due to reward misalignment and error propagation. To tackle the reward entanglement, we design Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between a navigation-first mode for exploration and an answer-first mode for question-answering to mitigate reward conflict. To calibrate the single-step error, we propose Confidence-Guided Adaptive Navigation Reflection (CANR), a mechanism that estimates per-step confidence, triggers reflection only when necessary, and uses contrastive rewards to encourage self-correction to calibrate the single-step inaccuracy. With the above as the main components, we finally develop our StepGuard, a new framework of Guarding Web Navigation via Single-Step Calibration. Experiments demonstrate that our approach significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks.
Abstract:A microwave linear analog computer (MiLAC) is a tunable microwave network that performs computation through wave propagation in the analog domain. In beamforming, data streams pass through a reconfigurable admittance network and emerge as antenna signals. For communications, MiLACs are preferably lossless and reciprocal to avoid power dissipation and non-reciprocal components, but these constraints limit the analog beamformers they can realize. Fully-connected MiLACs offer broad flexibility at the cost of a quadratic number of tunable admittances in the antenna count. Stem-connected MiLACs reduce this scaling to linear and preserve point-to-point capacity, but their role in multiuser downlink beamforming and under bounded, discrete hardware constraints has remained open. This paper addresses both questions for the multiuser multiple-input single-output downlink. We show that a stem-connected MiLAC can realize every beamformer on the complex Stiefel manifold and prove that, when $N\ge 2K-1$, this Stiefel-restricted design achieves the same sum-rate as the fully-connected MiLAC, where $N$ and $K$ are the numbers of transmit antennas and users. We then develop a weighted minimum mean-square error solver with a Riemannian Stiefel update, together with a closed-form projection baseline and an alternating refinement for bounded, discrete susceptances. Simulations show that the stem-connected MiLAC matches fully-connected MiLAC performance, approaches the fully digital sum-rate upper bound without symbol-rate digital processing, and recovers most of the loss caused by direct hardware-grid quantization.
Abstract:Industrial Internet systems face increasing threats from sophisticated industrial control system (ICS) attacks, resulting in critical safety incidents. However, existing tools exhibit limited effectiveness in real-time anomaly detection due to the complex dependencies among sensors and actuators. To tackle this, we present IstGPT, the first industrial anomaly detection tool based on LLMs and graph learning to provide real-time protection against a wide range of ICS attacks. IstGPT achieves fine-grained and precise modeling on spatial-temporal dependencies in industrial cyber-physical systems. It first leverages industrial multi-modal knowledge, including operational data, technical documents, and system diagrams, to extract sensor-actuator dependency graphs via multi-stage prompt engineering. Then, LLM-Optimation iteratively refines the graph based on node accuracy, edge consistency, and logical coherence. Finally, IstGPT integrated improved graph neural networks with an encoder-decoder architecture to detect anomalies via reconstruction errors. We evaluate IstGPT against 12 state-of-the-art baselines on 9 datasets, including 2 public, 6 simulated, and a real-world robotic arm dataset. IstGPT achieves the best F1-scores and eTaF1 (a newer time-aware metric) across nine datasets. We further discuss the feasibility of deploying IstGPT in real-world industrial scenarios.
Abstract:Video Motion Magnification (VMM) reveals imperceptible dynamics but often suffers from structural inconsistencies under complex geometric transformations. Existing learning-based methods generally face a trade-off between the limited global context of CNNs and the high computational cost of Transformers. In addition, current training protocols, largely dominated by simple linear motion, fail to capture the geometric and imaging complexities encountered in real-world videos. To address these issues, we propose GeoMag, a geometric-aware VMM framework built upon State Space Models to achieve globally consistent motion amplification with linear complexity. We further construct Geo-200K, a large-scale synthetic dataset that introduces rich geometric transformations together with sensor-realistic degradations, improving the diversity and realism of training signals. Extensive experiments on synthetic and real-world benchmarks show that GeoMag consistently outperforms prior methods in visual fidelity and computational efficiency, while producing fewer artifacts and better structural consistency.
Abstract:Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.
Abstract:Tool-integrated reasoning (TIR) offers a direct way to extend thinking models beyond the limits of text-only reasoning. Paradoxically, we observe that tool-enabled evaluation can degrade reasoning performance even when the strong thinking models make almost no actual tool calls. In this paper, we investigate how to inject natural tool-use behavior into a strong thinking model without sacrificing its no-tool reasoning ability, and present a comprehensive TIR recipe. We highlight that (i) the effectiveness of TIR supervised fine-tuning (SFT) hinges on the learnability of teacher trajectories, which should prioritize problems inherently suited for tool-augmented solutions; (ii) controlling the proportion of tool-use trajectories could mitigate the catastrophic forgetting of text-only reasoning capacity; (iii) optimizing for pass@k and response length instead of training loss could maximize TIR SFT gains while preserving headroom for reinforcement learning (RL) exploration; (iv) a stable RL with verifiable rewards (RLVR) stage, built upon suitable SFT initialization and explicit safeguards against mode collapse, provides a simple yet remarkably effective solution. When applied to Qwen3 thinking models at 4B and 30B scales, our recipe yields models that achieve state-of-the-art performance in a wide range of benchmarks among open-source models, such as 96.7% and 99.2% on AIME 2025 for 4B and 30B, respectively.
Abstract:Open-Set Object Detection (OSOD) is crucial for autonomous driving, where perception systems must recognize and localize both known and previously unseen objects in complex, dynamic environments. While recent approaches deliver promising results, they often require retraining the detector extensively to learn objectness, which describes the likelihood that a bounding box tightly encloses a valid object, regardless of whether its category was learned during training. Deviating from existing work, we hypothesize that standard off-the-shelf detectors may already contain helpful cues for objectness, owing to their training on numerous and diverse known categories. Building on this idea, we propose NAN-SPOT, a training-light framework that does not require to retrain the base object detector and estimates objectness by leveraging a hidden layer metric called Negative-Aware Norm (NAN), requiring only minutes of training on just hundreds of images. To support comprehensive evaluation, we introduce COCO-Open, an expanded version of the existing COCO-Mixed dataset, increasing unknown object annotations from 433 to 1853, making it the most exhaustively labeled dataset for OSOD to the best of our knowledge. Experimental results demonstrate that NAN-SPOT achieves even better performance on unknown object detection than methods requiring heavy training, without compromising performance on known objects. This efficiency and robustness make NAN-SPOT a promising step towards open-world perception in autonomous driving.
Abstract:In large antenna arrays, hardware power consumption becomes a dominant design constraint, making energy efficiency (EE) a first-class objective alongside spectral efficiency (SE). Microwave linear analog computer (MiLAC)-aided beamforming, whose front end is a passive reciprocal stream-to-antenna network, addresses this tension by reducing the active radio-frequency chain count to the stream number, at a moderate SE cost. Despite this promise, no EE optimization framework has been established for MiLAC-aided beamforming that accounts for digital-to-analog converter quantization noise and post-quantized transmit power. We fill this gap for downlink multiuser multiple-input single-output (MU-MISO) systems by formulating quantization-aware EE maximization over the MiLAC-feasible beamformer and characterizing the resulting SE-EE tradeoff. Three contributions follow. First, we prove a row-space optimality property of the effective MiLAC-aided beamformer, yielding an equivalent reduced-dimension reformulation whose complexity scales with the stream number rather than the antenna number. Second, we develop a low-complexity Dinkelbach-weighted minimum mean-square error algorithm aided by projected gradient descent that is guaranteed to converge to a stationary point. Third, we cast the SE-EE tradeoff as a multi-objective problem and trace its Pareto boundary via a weighted-sum method that combines an alternative reduced-dimension coordinate with auxiliary-variable successive convex approximation, yielding convex per-iteration subproblems with guaranteed convergence. Numerical results on a DeepMIMO v4 deployment show MiLAC-aided beamforming substantially improves EE over digital and hybrid benchmarks at a moderate SE cost and significantly expands the achievable SE-EE operating region.
Abstract:Integrated Sensing and Communication (ISAC) systems require efficient beamforming architectures to jointly support communication and sensing functionalities. To reduce hardware overhead, Hybrid Beamforming (HBF) has been widely studied and shown to achieve performance close to fully digital beamforming under practical hardware constraints. As a promising evolution, Reconfigurable Antenna (RA) technologies have recently emerged to further enhance beamforming Degrees of Freedom (DoFs) by dynamically reconfiguring antenna Electromagnetic(EM) characteristics, yet their integration into ISAC systems remains largely unexplored. In this paper, we investigate an RA-assisted ISAC system and develop a decoupled Triple-Hybrid Beamforming (Tri-HBF) framework that alternatively optimizes digital, analog, and EM beamformers to maximize the communication rate and sensing Signal-to-Clutter-plus-NoiseRatio (SCNR). For both Single-user Single-target (SUST) and Multiple-user Multiple-target (MUMT) scenarios, we first transform the original fractional objectives into fraction-free ones via methods tailored to their respective structures. The resulting problems are then solved via alternating optimization over different variable blocks. Closed-form updates are derived for all variables except the EM beamforming subproblem in the MUMT scenario. To further reduce the complexity introduced by Semidefinite Relaxation (SDR) in EM beamforming, we propose a low-complexity iterative approach across antennas with closed-form updates. Simulation results demonstrate that the proposed scheme significantly outperforms benchmark designs with conventional omnidirectional and directional antennas, achievingalmost 100% improvement in spectrum efficiency and 62.5% reduction in antenna overhead, thereby unveiling the