Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Zhang

Alibaba Group

Exploring Loss Landscapes through the Lens of Spin Glass Theory

Jul 30, 2024

Hao Liao, Wei Zhang, Zhanyi Huang, Zexiao Long, Mingyang Zhou, Xiaoqun Wu, Rui Mao, Chi Ho Yeung

Figure 1 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 2 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 3 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 4 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Abstract:In the past decade, significant strides in deep learning have led to numerous groundbreaking applications. Despite these advancements, the understanding of the high generalizability of deep learning, especially in such an over-parametrized space, remains limited. Successful applications are often considered as empirical rather than scientific achievements. For instance, deep neural networks' (DNNs) internal representations, decision-making mechanism, absence of overfitting in an over-parametrized space, high generalizability, etc., remain less understood. This paper delves into the loss landscape of DNNs through the lens of spin glass in statistical physics, i.e. a system characterized by a complex energy landscape with numerous metastable states, to better understand how DNNs work. We investigated a single hidden layer Rectified Linear Unit (ReLU) neural network model, and introduced several protocols to examine the analogy between DNNs (trained with datasets including MNIST and CIFAR10) and spin glass. Specifically, we used (1) random walk in the parameter space of DNNs to unravel the structures in their loss landscape; (2) a permutation-interpolation protocol to study the connection between copies of identical regions in the loss landscape due to the permutation symmetry in the hidden layers; (3) hierarchical clustering to reveal the hierarchy among trained solutions of DNNs, reminiscent of the so-called Replica Symmetry Breaking (RSB) phenomenon (i.e. the Parisi solution) in analogy to spin glass; (4) finally, we examine the relationship between the degree of the ruggedness of the loss landscape of the DNN and its generalizability, showing an improvement of flattened minima.

* 21 pages, 10 figures

Via

Access Paper or Ask Questions

Affective Behaviour Analysis via Progressive Learning

Jul 26, 2024

Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu

Figure 1 for Affective Behaviour Analysis via Progressive Learning

Figure 2 for Affective Behaviour Analysis via Progressive Learning

Figure 3 for Affective Behaviour Analysis via Progressive Learning

Figure 4 for Affective Behaviour Analysis via Progressive Learning

Abstract:Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.

* Techical Report for 7th ABAW Competition

Via

Access Paper or Ask Questions

MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Jul 26, 2024

Alireza Heidari, Amirhossein Ahmadi, Zefeng Zhi, Wei Zhang

Figure 1 for MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Figure 2 for MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Figure 3 for MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Figure 4 for MetaHive: A Cache-Optimized Metadata Management for Heterogeneous Key-Value Stores

Abstract:Cloud key-value (KV) stores provide businesses with a cost-effective and adaptive alternative to traditional on-premise data management solutions. KV stores frequently consist of heterogeneous clusters, characterized by varying hardware specifications of the deployment nodes, with each node potentially running a distinct version of the KV store software. This heterogeneity is accompanied by the diverse metadata that they need to manage. In this study, we introduce MetaHive, a cache-optimized approach to managing metadata in heterogeneous KV store clusters. MetaHive disaggregates the original data from its associated metadata to promote independence between them, while maintaining their interconnection during usage. This makes the metadata opaque from the downstream processes and the other KV stores in the cluster. MetaHive also ensures that the KV and metadata entries are stored in the vicinity of each other in memory and storage. This allows MetaHive to optimally utilize the caching mechanism without extra storage read overhead for metadata retrieval. We deploy MetaHive to ensure data integrity in RocksDB and demonstrate its rapid data validation with minimal effect on performance.

* VLDB 2024
* Cloud Databases

Via

Access Paper or Ask Questions

Channel Estimation for Movable-Antenna MIMO Systems Via Tensor Decomposition

Jul 26, 2024

Ruoyu Zhang, Lei Cheng, Wei Zhang, Xinrong Guan, Yueming Cai, Wen Wu, Rui Zhang

Figure 1 for Channel Estimation for Movable-Antenna MIMO Systems Via Tensor Decomposition

Figure 2 for Channel Estimation for Movable-Antenna MIMO Systems Via Tensor Decomposition

Figure 3 for Channel Estimation for Movable-Antenna MIMO Systems Via Tensor Decomposition

Abstract:In this letter, we investigate the channel estimation problem for MIMO wireless communication systems with movable antennas (MAs) at both the transmitter (Tx) and receiver (Rx). To achieve high channel estimation accuracy with low pilot training overhead, we propose a tensor decomposition-based method for estimating the parameters of multi-path channel components, including their azimuth and elevation angles, as well as complex gain coefficients, thereby reconstructing the wireless channel between any pair of Tx and Rx MA positions in the Tx and Rx regions. First, we introduce a two-stage Tx-Rx successive antenna movement pattern for pilot training, such that the received pilot signals in both stages can be expressed as a third-order tensor. Then, we obtain the factor matrices of the tensor via the canonical polyadic decomposition, and thereby estimate the angle/gain parameters for enabling the channel reconstruction between arbitrary Tx/Rx MA positions. In addition, we analyze the uniqueness condition of the tensor decomposition, which ensures the complete channel reconstruction between the whole Tx and Rx regions based on the channel measurements at only a finite number of Tx/Rx MA positions. Finally, simulation results are presented to evaluate the proposed tensor decomposition-based method as compared to existing methods, in terms of channel estimation accuracy and pilot overhead.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions

Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Jul 24, 2024

Jianyu Wang, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 2 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 3 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 4 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Abstract:In this paper, we propose a virtual full-duplex (VFD) technique with zero-interval modulation and sampling (ZIMS), where two half-duplex (HD) transceivers can simultaneously transmit signals and each transceiver can effectively receive the desired information. In ZIMS-VFD, the transceiver inserts a zero-interval for each symbol in the transmit signal and provides self-interference (SI)-free intervals for itself. Meanwhile, it samples the receive signal in the provided SI-free intervals and restores the desired symbols. Based on orthogonal frequency division multiplexing (OFDM), we formulate the system model and show the transmit signal structure. Then, we give the transceiver design for single input single output (SISO) ZIMS-VFD and extend it to multiple input multiple output (MIMO) communications. Numerical results verify our theoretical analyses and show that ZIMS-VFD can effectively increase the capacity and approach the FD without SI.

Via

Access Paper or Ask Questions

The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Tao Zhang, Hailin Zhang

Figure 1 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 2 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 3 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 4 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Abstract:For unforeseen emergencies, such as natural disasters and pandemic events, it is highly demanded to cope with the explosive growth of mobile data traffic in extremely critical environments. An Unmanned aerial vehicle (UAV) fleet is an effective way to facilitate the Emergency wireless COmmunication NETwork (EcoNet). In this article, a MUlti-tier Heterogeneous UAV Network (MuHun), which is with different UAV fleets in different altitudes, is proposed to flexibly serve various emergencies. We refresh the key performance indicators of full coverage, network capacity, low latency, and energy efficiency in harsh environments. Then, we present the special challenges regarding shadowing-dominated complex channel model, energy supply limited short-endurance, various communication mechanisms coexistence, and communication island for underground users in UAV-based EcoNet, followed by the MuHun-based EcoNet architecture and its advantages. Furthermore, some potential solutions such as the new hybrid-channel adapted resource allocation, reconfigurable intelligent surface assisted UAV communications, competitive heterogenous-networks, and magnetic induction based air-to-ground/underground communications are discussed to effectively achieve full coverage, high capacity, high energy efficiency, and diverse qualities of services for EcoNets in harsh environments.

Via

Access Paper or Ask Questions

Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 2 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 3 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 4 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Abstract:For unforeseen natural disasters, such as earthquakes, hurricanes, and floods, etc., the traditional communication infrastructure is unavailable or seriously disrupted along with persistent secondary disasters. Under such circumstances, it is highly demanded to deploy emergency wireless communication (EWC) networks to restore connectivity in accident/incident areas. The emerging fifth-generation (5G)/beyond-5G (B5G) wireless communication system, like unmanned aerial vehicle (UAV) assisted networks and intelligent reflecting surface (IRS) based communication systems, are expected to be designed or re-farmed for supporting temporary high quality communications in post-disaster areas. However, the channel characteristics of post-disaster areas quickly change as the secondary disaster resulted topographical changes, imposing new but critical challenges for EWC networks. In this paper, we propose a novel heterogeneous $\mathcal{F}$ composite fading channel model for EWC networks which accurately models and characterizes the composite fading channel with reflectors, path-loss exponent, fading, and shadowing parameters in 5G-UAV based EWC networks. Based on the model, we develop the optimal power allocation scheme with the simple closed-form expression and the numerical results based optimal joint bandwidth-power allocation scheme. We derive the corresponding capacities and compare the energy efficiency between IRS and traditional relay based 5G-UAVs. Numerical results show that the new heterogeneous Fisher-Snedecor $\mathcal{F}$ composite fading channel adapted resource allocation schemes can achieve higher capacity and energy efficiency than those of traditional channel model adapted resource allocation schemes, thus providing better communications service for post-disaster areas.

Via

Access Paper or Ask Questions

Enhancing LLM's Cognition via Structurization

Jul 23, 2024

Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye

Figure 1 for Enhancing LLM's Cognition via Structurization

Figure 2 for Enhancing LLM's Cognition via Structurization

Figure 3 for Enhancing LLM's Cognition via Structurization

Figure 4 for Enhancing LLM's Cognition via Structurization

Abstract:When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM's cognition capability, this paper presents a novel concept of context structurization. Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements. By doing so, LLMs can better grasp intricate and extended contexts through precise attention and information-seeking along the organized structures. Extensive evaluations are conducted across various model architectures and sizes (including several 7B- to 72B-size auto-regressive LLMs as well as BERT-like masking models) on a diverse set of NLP tasks (e.g., context-based question-answering, exhaustive hallucination evaluation, and passage-level dense retrieval). Empirical results show consistent and significant performance gains afforded by a single-round structurization. In particular, we boost a 72B-parameter open-source model to achieve comparable performance against GPT-3.5-Turbo as the hallucination evaluator. Besides, we show the feasibility of distilling advanced LLMs' language processing abilities to a smaller yet effective StruXGPT-7B to execute structurization, addressing the practicality of our approach. Code will be made public soon.

* N/A

Via

Access Paper or Ask Questions

Norface: Improving Facial Expression Analysis by Identity Normalization

Jul 22, 2024

Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen Ma, Wei Zhang, Yan Song, Yujing Hu, Wei Chen, Yu Ding

Figure 1 for Norface: Improving Facial Expression Analysis by Identity Normalization

Figure 2 for Norface: Improving Facial Expression Analysis by Identity Normalization

Figure 3 for Norface: Improving Facial Expression Analysis by Identity Normalization

Figure 4 for Norface: Improving Facial Expression Analysis by Identity Normalization

Abstract:Facial Expression Analysis remains a challenging task due to unexpected task-irrelevant noise, such as identity, head pose, and background. To address this issue, this paper proposes a novel framework, called Norface, that is unified for both Action Unit (AU) analysis and Facial Emotion Recognition (FER) tasks. Norface consists of a normalization network and a classification network. First, the carefully designed normalization network struggles to directly remove the above task-irrelevant noise, by maintaining facial expression consistency but normalizing all original images to a common identity with consistent pose, and background. Then, these additional normalized images are fed into the classification network. Due to consistent identity and other factors (e.g. head pose, background, etc.), the normalized images enable the classification network to extract useful expression information more effectively. Additionally, the classification network incorporates a Mixture of Experts to refine the latent representation, including handling the input of facial representations and the output of multiple (AU or emotion) labels. Extensive experiments validate the carefully designed framework with the insight of identity normalization. The proposed method outperforms existing SOTA methods in multiple facial expression analysis tasks, including AU detection, AU intensity estimation, and FER tasks, as well as their cross-dataset tasks. For the normalized datasets and code please visit {https://norface-fea.github.io/}.

* Accepted by ECCV2024

Via

Access Paper or Ask Questions

EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

Jul 20, 2024

Wei Zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao

Figure 1 for EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

Figure 2 for EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

Figure 3 for EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

Figure 4 for EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

Abstract:Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on interpreting image-level RS data and only support interaction with language instruction, restricting flexibility applications in the real world. To address those limitations, the first visual prompting model named EarthMarker is proposed, which excels in image-level, region-level, and point-level RS imagery interpretation. Specifically, the visual prompts alongside images and text instruction input into the large language model (LLM), adapt models toward specific predictions and tasks. Subsequently, a sharing visual encoding method is introduced to refine multi-scale image features and visual prompt information uniformly. Furthermore, to endow the EarthMarker with versatile multi-granularity visual perception abilities, the cross-domain phased learning strategy is developed, and the disjoint parameters are optimized in a lightweight manner by leveraging both the natural and RS domain-specific knowledge. In addition, to tackle the lack of RS visual prompting data, a dataset named RSVP featuring multi-modal fine-grained visual prompting instruction is constructed. Extensive experiments are conducted to demonstrate the proposed EarthMarker's competitive performance, representing a significant advance in multi-granularity RS imagery interpretation under the visual prompting learning framework.

Via

Access Paper or Ask Questions