Abstract:This paper argues that a systemic lack of Agency constrains the implicit reasoning capabilities of current Vision-Language Models (VLMs). Implicit reasoning refers to the ability to autonomously discover and utilize hidden visual evidence to bridge information gaps, rather than merely relying on explicitly specified targets. This capacity underlies human visual understanding and everyday reasoning. We argue that this limitation arises from a tendency to approach visual reasoning primarily as passive semantic retrieval, rather than as active, situated reasoning that depends on autonomous visual exploration. As a result, most existing benchmarks primarily assess Passive Capacity, leaving this aspect of reasoning largely unmeasured. To address this gap, we introduce the Visual Implicit Reasoning Diagnosing Benchmark (V-IRD), which targets this missing quadrant by requiring models to derive answers strictly through autonomous visual analysis. Our results show that, despite strong retrieval abilities, prominent VLMs struggle to utilize reference objects and to attend to visual evidence that requires self-directed inquiry. Simply put, strong semantic recognition does not equate to active visual exploration, revealing a critical gap in current VLMs. More information can be found at https://haoychen.github.io/Implicit-Reasoning/
Abstract:The sixth generation (6G) wireless systems are envisioned to enable the paradigm shift from "connected things" to "connected intelligence", featured by ultra high density, large-scale, dynamic heterogeneity, diversified functional requirements and machine learning capabilities, which leads to a growing need for highly efficient intelligent algorithms. The classic optimization-based algorithms usually require highly precise mathematical model of data links and suffer from poor performance with high computational cost in realistic 6G applications. Based on domain knowledge (e.g., optimization models and theoretical tools), machine learning (ML) stands out as a promising and viable methodology for many complex large-scale optimization problems in 6G, due to its superior performance, generalizability, computational efficiency and robustness. In this paper, we systematically review the most representative "learning to optimize" techniques in diverse domains of 6G wireless networks by identifying the inherent feature of the underlying optimization problem and investigating the specifically designed ML frameworks from the perspective of optimization. In particular, we will cover algorithm unrolling, learning to branch-and-bound, graph neural network for structured optimization, deep reinforcement learning for stochastic optimization, end-to-end learning for semantic optimization, as well as federated learning for distributed optimization, for solving challenging large-scale optimization problems arising from various important wireless applications. Through the in-depth discussion, we shed light on the excellent performance of ML-based optimization algorithms with respect to the classical methods, and provide insightful guidance to develop advanced ML techniques in 6G networks.




Abstract:Massive access is a critical design challenge of Internet of Things (IoT) networks. In this paper, we consider the grant-free uplink transmission of an IoT network with a multiple-antenna base station (BS) and a large number of single-antenna IoT devices. Taking into account the sporadic nature of IoT devices, we formulate the joint activity detection and channel estimation (JADCE) problem as a group-sparse matrix estimation problem. This problem can be solved by applying the existing compressed sensing techniques, which however either suffer from high computational complexities or lack of algorithm robustness. To this end, we propose a novel algorithm unrolling framework based on the deep neural network to simultaneously achieve low computational complexity and high robustness for solving the JADCE problem. Specifically, we map the original iterative shrinkage thresholding algorithm (ISTA) into an unrolled recurrent neural network (RNN), thereby improving the convergence rate and computational efficiency through end-to-end training. Moreover, the proposed algorithm unrolling approach inherits the structure and domain knowledge of the ISTA, thereby maintaining the algorithm robustness, which can handle non-Gaussian preamble sequence matrix in massive access. With rigorous theoretical analysis, we further simplify the unrolled network structure by reducing the redundant training parameters. Furthermore, we prove that the simplified unrolled deep neural network structures enjoy a linear convergence rate. Extensive simulations based on various preamble signatures show that the proposed unrolled networks outperform the existing methods in terms of the convergence rate, robustness and estimation accuracy.



Abstract:In this paper, we consider decentralized federated learning (FL) over wireless networks, where over-the-air computation (AirComp) is adopted to facilitate the local model consensus in a device-to-device (D2D) communication manner. However, the AirComp-based consensus phase brings the additive noise in each algorithm iterate and the consensus needs to be robust to wireless network topology changes, which introduce a coupled and novel challenge of establishing the convergence for wireless decentralized FL algorithm. To facilitate consensus phase, we propose an AirComp-based DSGD with gradient tracking and variance reduction (DSGT-VR) algorithm, where both precoding and decoding strategies are developed for D2D communication. Furthermore, we prove that the proposed algorithm converges linearly and establish the optimality gap for strongly convex and smooth loss functions, taking into account the channel fading and noise. The theoretical result shows that the additional error bound in the optimality gap depends on the number of devices. Extensive simulations verify the theoretical results and show that the proposed algorithm outperforms other benchmark decentralized FL algorithms over wireless networks.