Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Zhang

Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University, China

Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

Dec 09, 2024

Longwei Wang, Xueqian Li, Zheng Zhang

Figure 1 for Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

Figure 2 for Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

Figure 3 for Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

Figure 4 for Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

Abstract:The resilience of convolutional neural networks against input variations and adversarial attacks remains a significant challenge in image recognition tasks. Motivated by the need for more robust and reliable image recognition systems, we propose the Dense Cross-Connected Ensemble Convolutional Neural Network (DCC-ECNN). This novel architecture integrates the dense connectivity principle of DenseNet with the ensemble learning strategy, incorporating intermediate cross-connections between different DenseNet paths to facilitate extensive feature sharing and integration. The DCC-ECNN architecture leverages DenseNet's efficient parameter usage and depth while benefiting from the robustness of ensemble learning, ensuring a richer and more resilient feature representation.

* 6 pages, 1 figure

Via

Access Paper or Ask Questions

Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

Dec 09, 2024

Weidong Zhu, Xueqian Li, Longwei Wang, Zheng Zhang

Figure 1 for Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

Figure 2 for Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

Figure 3 for Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

Figure 4 for Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

Abstract:Cognitive radio networks (CRNs) have traditionally focused on utilizing idle channels to enhance spectrum efficiency. However, as wireless networks grow denser, channel-centric strategies face increasing limitations. This paper introduces a paradigm shift by exploring the underutilized potential of idle spatial dimensions, termed idle space, in co-channel transmissions. By integrating massive multiple-input multiple-output (MIMO) systems with signal alignment techniques, we enable secondary users to transmit without causing interference to primary users by aligning their signals within the null spaces of primary receivers. We propose a comprehensive framework that synergizes spatial spectrum sensing, signal alignment, and resource allocation, specifically designed for secondary users in CRNs. Theoretical analyses and extensive simulations validate the framework, demonstrating substantial gains in spectrum efficiency, throughput, and interference mitigation. The results show that the proposed approach not only ensures interference-free coexistence with primary users but also unlocks untapped spatial resources for secondary transmissions.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions

PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst

Dec 05, 2024

Qingyang Mao, Qi Liu, Zhi Li, Mingyue Cheng, Zheng Zhang, Rui Li

Figure 1 for PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst

Figure 2 for PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst

Figure 3 for PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst

Figure 4 for PoTable: Programming Standardly on Table-based Reasoning Like a Human Analyst

Abstract:Table-based reasoning has garnered substantial research interest, particularly in its integration with Large Language Model (LLM) which has revolutionized the general reasoning paradigm. Numerous LLM-based studies introduce symbolic tools (e.g., databases, Python) as assistants to extend human-like abilities in structured table understanding and complex arithmetic computations. However, these studies can be improved better in simulating human cognitive behavior when using symbolic tools, as they still suffer from limitations of non-standard logical splits and constrained operation pools. In this study, we propose PoTable as a novel table-based reasoning method that simulates a human tabular analyst, which integrates a Python interpreter as the real-time executor accompanied by an LLM-based operation planner and code generator. Specifically, PoTable follows a human-like logical stage split and extends the operation pool into an open-world space without any constraints. Through planning and executing in each distinct stage, PoTable standardly completes the entire reasoning process and produces superior reasoning results along with highly accurate, steply commented and completely executable programs. Accordingly, the effectiveness and explainability of PoTable are fully demonstrated. Extensive experiments over three evaluation datasets from two public benchmarks on two backbones show the outstanding performance of our approach. In particular, GPT-based PoTable achieves over 4% higher absolute accuracy than runner-ups on all evaluation datasets.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Factorized Visual Tokenization and Generation

Nov 25, 2024

Zechen Bai, Jianxiong Gao, Ziteng Gao, Pichao Wang, Zheng Zhang, Tong He, Mike Zheng Shou

Figure 1 for Factorized Visual Tokenization and Generation

Figure 2 for Factorized Visual Tokenization and Generation

Figure 3 for Factorized Visual Tokenization and Generation

Figure 4 for Factorized Visual Tokenization and Generation

Abstract:Visual tokenizers are fundamental to image generation. They convert visual data into discrete tokens, enabling transformer-based models to excel at image generation. Despite their success, VQ-based tokenizers like VQGAN face significant limitations due to constrained vocabulary sizes. Simply expanding the codebook often leads to training instability and diminishing performance gains, making scalability a critical challenge. In this work, we introduce Factorized Quantization (FQ), a novel approach that revitalizes VQ-based tokenizers by decomposing a large codebook into multiple independent sub-codebooks. This factorization reduces the lookup complexity of large codebooks, enabling more efficient and scalable visual tokenization. To ensure each sub-codebook captures distinct and complementary information, we propose a disentanglement regularization that explicitly reduces redundancy, promoting diversity across the sub-codebooks. Furthermore, we integrate representation learning into the training process, leveraging pretrained vision models like CLIP and DINO to infuse semantic richness into the learned representations. This design ensures our tokenizer captures diverse semantic levels, leading to more expressive and disentangled representations. Experiments show that the proposed FQGAN model substantially improves the reconstruction quality of visual tokenizers, achieving state-of-the-art performance. We further demonstrate that this tokenizer can be effectively adapted into auto-regressive image generation. https://showlab.github.io/FQGAN

Via

Access Paper or Ask Questions

DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

Nov 20, 2024

Zheng Zhang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Figure 1 for DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

Figure 2 for DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

Figure 3 for DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

Figure 4 for DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

Abstract:To address communication latency issues, the Third Generation Partnership Project (3GPP) has defined Cellular-Vehicle to Everything (C-V2X) technology, which includes Vehicle-to-Vehicle (V2V) communication for direct vehicle-to-vehicle communication. However, this method requires vehicles to autonomously select communication resources based on the Semi-Persistent Scheduling (SPS) protocol, which may lead to collisions due to different vehicles sharing the same communication resources, thereby affecting communication effectiveness. Non-Orthogonal Multiple Access (NOMA) is considered a potential solution for handling large-scale vehicle communication, as it can enhance the Signal-to-Interference-plus-Noise Ratio (SINR) by employing Successive Interference Cancellation (SIC), thereby reducing the negative impact of communication collisions. When evaluating vehicle communication performance, traditional metrics such as reliability and transmission delay present certain contradictions. Introducing the new metric Age of Information (AoI) provides a more comprehensive evaluation of communication system. Additionally, to ensure service quality, user terminals need to possess high computational capabilities, which may lead to increased energy consumption, necessitating a trade-off between communication energy consumption and effectiveness. Given the complexity and dynamics of communication systems, Deep Reinforcement Learning (DRL) serves as an intelligent learning method capable of learning optimal strategies in dynamic environments. Therefore, this paper analyzes the effects of multi-priority queues and NOMA on AoI in the C-V2X vehicular communication system and proposes an energy consumption and AoI optimization method based on DRL. Finally, through comparative simulations with baseline methods, the proposed approach demonstrates its advances in terms of energy consumption and AoI.

* This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/DRL-Based-Optimization-for-Information-of-Age-and-Energy-Consumption-in-C-V2X-Enabled-IoV

Via

Access Paper or Ask Questions

Coverage-Constrained Human-AI Cooperation with Multiple Experts

Nov 18, 2024

Zheng Zhang, Cuong Nguyen, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro

Figure 1 for Coverage-Constrained Human-AI Cooperation with Multiple Experts

Figure 2 for Coverage-Constrained Human-AI Cooperation with Multiple Experts

Figure 3 for Coverage-Constrained Human-AI Cooperation with Multiple Experts

Figure 4 for Coverage-Constrained Human-AI Cooperation with Multiple Experts

Abstract:Human-AI cooperative classification (HAI-CC) approaches aim to develop hybrid intelligent systems that enhance decision-making in various high-stakes real-world scenarios by leveraging both human expertise and AI capabilities. Current HAI-CC methods primarily focus on learning-to-defer (L2D), where decisions are deferred to human experts, and learning-to-complement (L2C), where AI and human experts make predictions cooperatively. However, a notable research gap remains in effectively exploring both L2D and L2C under diverse expert knowledge to improve decision-making, particularly when constrained by the cooperation cost required to achieve a target probability for AI-only selection (i.e., coverage). In this paper, we address this research gap by proposing the Coverage-constrained Learning to Defer and Complement with Specific Experts (CL2DC) method. CL2DC makes final decisions through either AI prediction alone or by deferring to or complementing a specific expert, depending on the input data. Furthermore, we propose a coverage-constrained optimisation to control the cooperation cost, ensuring it approximates a target probability for AI-only selection. This approach enables an effective assessment of system performance within a specified budget. Also, CL2DC is designed to address scenarios where training sets contain multiple noisy-label annotations without any clean-label references. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that CL2DC achieves superior performance compared to state-of-the-art HAI-CC methods.

Via

Access Paper or Ask Questions

Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Nov 07, 2024

Yequan Zhao, Hai Li, Ian Young, Zheng Zhang

Figure 1 for Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Figure 2 for Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Figure 3 for Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Figure 4 for Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

Abstract:Back propagation (BP) is the default solution for gradient computation in neural network training. However, implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs), and analog computing platforms face multiple major challenges, such as the lack of hardware resources, long time-to-market, and dramatic errors in a low-precision setting. This paper presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design. We adopt a quantized zeroth-order method to estimate the gradients of quantized model parameters, which can overcome the error of a straight-through estimator in a low-precision BP scheme. We further employ a few dimension reduction methods (e.g., node perturbation, sparse training) to improve the convergence of zeroth-order training. Experiment results show that our BP-free training achieves comparable performance as BP-based training on adapting a pre-trained image classifier to various corrupted data on resource-constrained edge devices (e.g., an MCU with 1024-KB SRAM for dense full-model training, or an MCU with 256-KB SRAM for sparse training). This method is most suitable for application scenarios where memory cost and time-to-market are the major concerns, but longer latency can be tolerated.

Via

Access Paper or Ask Questions

Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

Nov 04, 2024

Weibo Gao, Qi Liu, Linan Yue, Fangzhou Yao, Hao Wang, Yin Gu, Zheng Zhang

Figure 1 for Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

Figure 2 for Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

Figure 3 for Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

Figure 4 for Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

Abstract:Learners sharing similar implicit cognitive states often display comparable observable problem-solving performances. Leveraging collaborative connections among such similar learners proves valuable in comprehending human learning. Motivated by the success of collaborative modeling in various domains, such as recommender systems, we aim to investigate how collaborative signals among learners contribute to the diagnosis of human cognitive states (i.e., knowledge proficiency) in the context of intelligent education. The primary challenges lie in identifying implicit collaborative connections and disentangling the entangled cognitive factors of learners for improved explainability and controllability in learner Cognitive Diagnosis (CD). However, there has been no work on CD capable of simultaneously modeling collaborative and disentangled cognitive states. To address this gap, we present Coral, a Collaborative cognitive diagnosis model with disentangled representation learning. Specifically, Coral first introduces a disentangled state encoder to achieve the initial disentanglement of learners' states. Subsequently, a meticulously designed collaborative representation learning procedure captures collaborative signals. It dynamically constructs a collaborative graph of learners by iteratively searching for optimal neighbors in a context-aware manner. Using the constructed graph, collaborative information is extracted through node representation learning. Finally, a decoding process aligns the initial cognitive states and collaborative states, achieving co-disentanglement with practice performance reconstructions. Extensive experiments demonstrate the superior performance of Coral, showcasing significant improvements over state-of-the-art methods across several real-world datasets. Our code is available at https://github.com/bigdata-ustc/Coral.

* Accepted by NeurIPS2024

Via

Access Paper or Ask Questions

Can Language Models Learn to Skip Steps?

Nov 04, 2024

Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang

Abstract:Trained on vast corpora of human language, language models demonstrate emergent human-like reasoning abilities. Yet they are still far from true intelligence, which opens up intriguing opportunities to explore the parallels of humans and model behaviors. In this work, we study the ability to skip steps in reasoning - a hallmark of human expertise developed through practice. Unlike humans, who may skip steps to enhance efficiency or to reduce cognitive load, models do not inherently possess such motivations to minimize reasoning steps. To address this, we introduce a controlled framework that stimulates step-skipping behavior by iteratively refining models to generate shorter and accurate reasoning paths. Empirical results indicate that models can develop the step skipping ability under our guidance. Moreover, after fine-tuning on expanded datasets that include both complete and skipped reasoning sequences, the models can not only resolve tasks with increased efficiency without sacrificing accuracy, but also exhibit comparable and even enhanced generalization capabilities in out-of-domain scenarios. Our work presents the first exploration into human-like step-skipping ability and provides fresh perspectives on how such cognitive abilities can benefit AI models.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models

Oct 20, 2024

Bo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang, Yifei Zhang, Liang Zhao

Abstract:Representation learning of Text-Attributed Graphs (TAGs) has garnered significant attention due to its applications in various domains, including recommendation systems and social networks. Despite advancements in TAG learning methodologies, challenges remain in explainability due to the black-box nature of existing TAG representation learning models. This paper presents TAGExplainer, the first method designed to generate natural language explanations for TAG learning. TAGExplainer employs a generative language model that maps input-output pairs to explanations reflecting the model's decision-making process. To address the lack of annotated ground truth explanations in real-world scenarios, we propose first generating pseudo-labels that capture the model's decisions from saliency-based explanations, then the pseudo-label generator is iteratively trained based on three training objectives focusing on faithfulness and brevity via Expert Iteration, to improve the quality of generated pseudo-labels. The high-quality pseudo-labels are finally utilized to train an end-to-end explanation generator model. Extensive experiments are conducted to demonstrate the effectiveness of TAGExplainer in producing faithful and concise natural language explanations.

Via

Access Paper or Ask Questions