Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Chen

Clustering ensemble algorithm with high-order consistency learning

Oct 31, 2024

Jianwen Gan, Yan Chen, Peng Zhou, Liang Du

Abstract:Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms.To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of the clustering ensemble, from the perspective of data mining, the intrinsic connections of data were mined based on the base clusters, and a high-order information fusion algorithm was proposed to represent the connections between data from different dimensions, namely Clustering Ensemble with High-order Consensus learning (HCLCE). Firstly, each high-order information was fused into a new structured consistency matrix. Then, the obtained multiple consistency matrices were fused together. Finally, multiple information was fused into a consistent result. Experimental results show that LCLCE algorithm has the clustering accuracy improved by an average of 7.22%, and the Normalized Mutual Information (NMI) improved by an average of 9.19% compared with the suboptimal Locally Weighted Evidence Accumulation (LWEA) algorithm. It can be seen that the proposed algorithm can obtain better clustering results compared with clustering ensemble algorithms and using one information alone.

* Journal of Computer Applications, 2023, 43(9),2665-2672
* in Chinese language

Via

Access Paper or Ask Questions

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Oct 24, 2024

Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong Liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang

Figure 1 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 2 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 3 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Figure 4 for Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Abstract:Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hindered by prediction errors inherent in DDIM inversion. These errors accumulate during the diffusion process, resulting in inferior content preservation and edit fidelity, especially with conditional inputs. We address these challenges by investigating the primary contributors to error accumulation in DDIM inversion and identify the singularity problem in traditional noise schedules as a key issue. To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing. This schedule reduces noise prediction errors, enabling more faithful editing that preserves the original content of the source image. Our approach requires no additional retraining and is compatible with various existing editing methods. Experiments across eight editing tasks demonstrate the Logistic Schedule's superior performance in content preservation and edit fidelity compared to traditional noise schedules, highlighting its adaptability and effectiveness.

* Accepted in NeurIPS 2024

Via

Access Paper or Ask Questions

Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Oct 11, 2024

Yan Chen, Jose Blanchet, Krzysztof Dembczynski, Laura Fee Nern, Aaron Flores

Figure 1 for Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Figure 2 for Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Figure 3 for Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Figure 4 for Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Abstract:Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We propose a pseudo maximum likelihood estimator and study its asymptotic normality in the context of increasingly imbalanced populations relative to an increasingly large sample size. We provide theoretical guarantees for the introduced estimator. Additionally, we compute the optimal downsampling rate using a criterion that balances statistical accuracy and computational efficiency. Our numerical experiments, conducted on both synthetic and empirical data, further validate our theoretical results, and demonstrate that the introduced estimator outperforms commonly available alternatives.

Via

Access Paper or Ask Questions

Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models

Oct 04, 2024

Yan Chen, Cheng Liu

Figure 1 for Remaining Useful Life Prediction: A Study on Multidimensional Industrial Signal Processing and Efficient Transfer Learning Based on Large Language Models

Abstract:Remaining useful life (RUL) prediction is crucial for maintaining modern industrial systems, where equipment reliability and operational safety are paramount. Traditional methods, based on small-scale deep learning or physical/statistical models, often struggle with complex, multidimensional sensor data and varying operating conditions, limiting their generalization capabilities. To address these challenges, this paper introduces an innovative regression framework utilizing large language models (LLMs) for RUL prediction. By leveraging the modeling power of LLMs pre-trained on corpus data, the proposed model can effectively capture complex temporal dependencies and improve prediction accuracy. Extensive experiments on the Turbofan engine's RUL prediction task show that the proposed model surpasses state-of-the-art (SOTA) methods on the challenging FD002 and FD004 subsets and achieves near-SOTA results on the other subsets. Notably, different from previous research, our framework uses the same sliding window length and all sensor signals for all subsets, demonstrating strong consistency and generalization. Moreover, transfer learning experiments reveal that with minimal target domain data for fine-tuning, the model outperforms SOTA methods trained on full target domain data. This research highlights the significant potential of LLMs in industrial signal processing and RUL prediction, offering a forward-looking solution for health management in future intelligent industrial systems.

Via

Access Paper or Ask Questions

Generative AI Application for Building Industry

Oct 01, 2024

Hanlong Wan, Jian Zhang, Yan Chen, Weili Xu, Fan Feng

Figure 1 for Generative AI Application for Building Industry

Figure 2 for Generative AI Application for Building Industry

Figure 3 for Generative AI Application for Building Industry

Figure 4 for Generative AI Application for Building Industry

Abstract:This paper investigates the transformative potential of generative AI technologies, particularly large language models (LLMs), within the building industry. By leveraging these advanced AI tools, the study explores their application across key areas such as energy code compliance, building design optimization, and workforce training. The research highlights how LLMs can automate labor-intensive processes, significantly improving efficiency, accuracy, and safety in building practices. The paper also addresses the challenges associated with interpreting complex visual and textual data in architectural plans and regulatory codes, proposing innovative solutions to enhance AI-driven compliance checking and design processes. Additionally, the study considers the broader implications of AI integration, including the development of AI-powered tools for comprehensive code compliance across various regulatory domains and the potential for AI to revolutionize workforce training through realistic simulations. This paper provides a comprehensive analysis of the current capabilities of generative AI in the building industry while outlining future directions for research and development, aiming to pave the way for smarter, more sustainable, and responsive construction practices.

* 28 pages, 11 figures, 4 tables

Via

Access Paper or Ask Questions

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Sep 29, 2024

Haonan Lin, Wenbin An, Jiahao Wang, Yan Chen, Feng Tian, Mengmeng Wang, Guang Dai, Qianying Wang, Jingdong Wang

Figure 1 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Figure 2 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Figure 3 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Figure 4 for Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

Abstract:Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD). Typically, this involves a teacher-student framework in which the teacher imparts knowledge to the student to classify categories, even in the absence of explicit labels. Nevertheless, GCD presents unique challenges, particularly the absence of priors for new classes, which can lead to the teacher's misguidance and unsynchronized learning with the student, culminating in suboptimal outcomes. In our work, we delve into why traditional teacher-student designs falter in open-world generalized category discovery as compared to their success in closed-world semi-supervised learning. We identify inconsistent pattern learning across attention layers as the crux of this issue and introduce FlipClass, a method that dynamically updates the teacher to align with the student's attention, instead of maintaining a static teacher reference. Our teacher-student attention alignment strategy refines the teacher's focus based on student feedback from an energy perspective, promoting consistent pattern recognition and synchronized learning across old and new classes. Extensive experiments on a spectrum of benchmarks affirm that FlipClass significantly surpasses contemporary GCD methods, establishing new standards for the field.

Via

Access Paper or Ask Questions

Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion

Sep 11, 2024

Yan Chen, Di Huang, Zhichao Liao, Xi Cheng, Xinghui Li, Lone Zeng

Figure 1 for Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion

Figure 2 for Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion

Figure 3 for Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion

Figure 4 for Training-Free Point Cloud Recognition Based on Geometric and Semantic Information Fusion

Abstract:The trend of employing training-free methods for point cloud recognition is becoming increasingly popular due to its significant reduction in computational resources and time costs. However, existing approaches are limited as they typically extract either geometric or semantic features. To address this limitation, we are the first to propose a novel training-free method that integrates both geometric and semantic features. For the geometric branch, we adopt a non-parametric strategy to extract geometric features. In the semantic branch, we leverage a model aligned with text features to obtain semantic features. Additionally, we introduce the GFE module to complement the geometric information of point clouds and the MFF module to improve performance in few-shot settings. Experimental results demonstrate that our method outperforms existing state-of-the-art training-free approaches on mainstream benchmark datasets, including ModelNet and ScanObiectNN.

Via

Access Paper or Ask Questions

A Primer on Near-Field Communications for Next-Generation Multiple Access

Aug 05, 2024

Chongjun Ouyang, Zhaolin Wang, Yan Chen, Xidong Mu, Peiying Zhu

Figure 1 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 2 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 3 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Figure 4 for A Primer on Near-Field Communications for Next-Generation Multiple Access

Abstract:Multiple-antenna technologies are advancing toward the development of extremely large aperture arrays and the utilization of extremely high frequencies, driving the progress of next-generation multiple access (NGMA). This evolution is accompanied by the emergence of near-field communications (NFC), characterized by spherical-wave propagation, which introduces additional range dimensions to the channel and enhances system throughput. In this context, a tutorial-based primer on NFC is presented, emphasizing its applications in multiuser communications and multiple access (MA). The following areas are investigated: \romannumeral1) the commonly used near-field channel models are reviewed along with their simplifications under various near-field conditions. \romannumeral2) Building upon these models, the information-theoretic capacity limits of NFC-MA are analyzed, including the derivation of sum-rate capacity and capacity region, and their upper limits for both downlink and uplink scenarios. \romannumeral3) A detailed investigation of near-field multiuser beamforming design is presented, offering low-complexity and effective NFC-MA design methodologies in both the spatial and wavenumber (angular) domains. Throughout these investigations, near-field MA is compared with its far-field counterpart to highlight its superiority and flexibility in terms of interference management, thereby laying the groundwork for achieving NGMA.

* 34 pages

Via

Access Paper or Ask Questions

Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

Jul 30, 2024

Jingran Xu, Huizhi Wang, Yong Zeng, Xiaoli Xu, Qingqing Wu, Fei Yang, Yan Chen, Abbas Jamalipour

Figure 1 for Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

Figure 2 for Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

Figure 3 for Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

Figure 4 for Efficient Channel Estimation for Millimeter Wave and Terahertz Systems Enabled by Integrated Super-resolution Sensing and Communication

Abstract:Integrated super-resolution sensing and communication (ISSAC) has emerged as a promising technology to achieve extremely high precision sensing for those key parameters, such as the angles of the sensing targets. In this paper, we propose an efficient channel estimation scheme enabled by ISSAC for millimeter wave (mmWave) and TeraHertz (THz) systems with a hybrid analog/digital beamforming architecture, where both the pilot overhead and the cost of radio frequency (RF) chains are significantly reduced. The key idea is to exploit the fact that subspace-based super-resolution algorithms such as multiple signal classification (MUSIC) can estimate channel parameters accurately without requiring dedicate a priori known pilots. In particular, the proposed method consists of two stages. First, the angles of the multi-path channel components are estimated in a pilot-free manner during the transmission of data symbols. Second, the multi-path channel coefficients are estimated with very few pilots. Compared to conventional channel estimation schemes that rely solely on channel training, our approach requires the estimation of much fewer parameters in the second stage. Furthermore, with channel multi-path angles obtained, the beamforming gain can be achieved when pilots are sent to estimate the channel path gains. To comprehensively investigate the performance of the proposed scheme, we consider both the basic line-of-sight (LoS) channels and more general multi-path channels. We compare the performance of the minimum mean square error (MMSE) of channel estimation and the resulting beamforming gains of our proposed scheme with the traditional scheme that rely exclusively on channel training. It is demonstrated that our proposed method significantly outperforms the benchmarking scheme. Simulation results are presented to validate our theoretical findings.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Jul 22, 2024

Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

Figure 1 for Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Figure 2 for Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Figure 3 for Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Figure 4 for Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

Abstract:Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acquiring different kinds of knowledge in a coupled manner may confuse models and hinder them from retrieving precise knowledge. Furthermore, the ``forward-only'' answering process fails to explicitly capture the knowledge needs of LLMs, which can further hurt answering quality. To cope with the above limitations, we propose DKA: Disentangled Knowledge Acquisition from LLM feedback, a training-free framework that disentangles knowledge acquisition to avoid confusion and uses LLM's feedback to specify the required knowledge. Specifically, DKA requires LLMs to specify what knowledge they need to answer the question and decompose the original complex question into two simple sub-questions: Image-based sub-question and Knowledge-based sub-question. Then we use the two sub-questions to retrieve knowledge from the image and knowledge base, respectively. In this way, two knowledge acquisition models can focus on the content that corresponds to them and avoid disturbance of irrelevant elements in the original complex question, which can help to provide more precise knowledge and better align the knowledge needs of LLMs to yield correct answers. Experiments on benchmark datasets show that DKA significantly outperforms SOTA models. To facilitate future research, our data and code are available at \url{https://github.com/Lackel/DKA}.

* Pre-print

Via

Access Paper or Ask Questions