Senior member, IEEE
Abstract:Efficient and lightweight single-image super-resolution (SISR) has achieved remarkable performance in recent years. One effective approach is the use of large kernel designs, which have been shown to improve the performance of SISR models while reducing their computational requirements. However, current state-of-the-art (SOTA) models still face problems such as high computational costs. To address these issues, we propose the Large Kernel Distillation Network (LKDN) in this paper. Our approach simplifies the model structure and introduces more efficient attention modules to reduce computational costs while also improving performance. Specifically, we employ the reparameterization technique to enhance model performance without adding extra cost. We also introduce a new optimizer from other tasks to SISR, which improves training speed and performance. Our experimental results demonstrate that LKDN outperforms existing lightweight SR methods and achieves SOTA performance.
Abstract:Personalized Federated Continual Learning (PFCL) is a new practical scenario that poses greater challenges in sharing and personalizing knowledge. PFCL not only relies on knowledge fusion for server aggregation at the global spatial-temporal perspective but also needs model improvement for each client according to the local requirements. Existing methods, whether in Personalized Federated Learning (PFL) or Federated Continual Learning (FCL), have overlooked the multi-granularity representation of knowledge, which can be utilized to overcome Spatial-Temporal Catastrophic Forgetting (STCF) and adopt generalized knowledge to itself by coarse-to-fine human cognitive mechanisms. Moreover, it allows more effectively to personalized shared knowledge, thus serving its own purpose. To this end, we propose a novel concept called multi-granularity prompt, i.e., coarse-grained global prompt acquired through the common model learning process, and fine-grained local prompt used to personalize the generalized representation. The former focuses on efficiently transferring shared global knowledge without spatial forgetting, and the latter emphasizes specific learning of personalized local knowledge to overcome temporal forgetting. In addition, we design a selective prompt fusion mechanism for aggregating knowledge of global prompts distilled from different clients. By the exclusive fusion of coarse-grained knowledge, we achieve the transmission and refinement of common knowledge among clients, further enhancing the performance of personalization. Extensive experiments demonstrate the effectiveness of the proposed method in addressing STCF as well as improving personalized performance. Our code now is available at https://github.com/SkyOfBeginning/FedMGP.
Abstract:Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.
Abstract:Learning temporal dependencies among targets (TDT) benefits better time series forecasting, where targets refer to the predicted sequence. Although autoregressive methods model TDT recursively, they suffer from inefficient inference and error accumulation. We argue that integrating TDT learning into non-autoregressive methods is essential for pursuing effective and efficient time series forecasting. In this study, we introduce the differencing approach to represent TDT and propose a parameter-free and plug-and-play solution through an optimization objective, namely TDT Loss. It leverages the proportion of inconsistent signs between predicted and ground truth TDT as an adaptive weight, dynamically balancing target prediction and fine-grained TDT fitting. Importantly, TDT Loss incurs negligible additional cost, with only $\mathcal{O}(n)$ increased computation and $\mathcal{O}(1)$ memory requirements, while significantly enhancing the predictive performance of non-autoregressive models. To assess the effectiveness of TDT loss, we conduct extensive experiments on 7 widely used datasets. The experimental results of plugging TDT loss into 6 state-of-the-art methods show that out of the 168 experiments, 75.00\% and 94.05\% exhibit improvements in terms of MSE and MAE with the maximum 24.56\% and 16.31\%, respectively.
Abstract:In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection, serving as a valuable reference for both academic and practical pursuits in this domain.
Abstract:Federated Class-Incremental Learning (FCIL) focuses on continually transferring the previous knowledge to learn new classes in dynamic Federated Learning (FL). However, existing methods do not consider the trustworthiness of FCIL, i.e., improving continual utility, privacy, and efficiency simultaneously, which is greatly influenced by catastrophic forgetting and data heterogeneity among clients. To address this issue, we propose FedProK (Federated Prototypical Feature Knowledge Transfer), leveraging prototypical feature as a novel representation of knowledge to perform spatial-temporal knowledge transfer. Specifically, FedProK consists of two components: (1) feature translation procedure on the client side by temporal knowledge transfer from the learned classes and (2) prototypical knowledge fusion on the server side by spatial knowledge transfer among clients. Extensive experiments conducted in both synchronous and asynchronous settings demonstrate that our FedProK outperforms the other state-of-the-art methods in three perspectives of trustworthiness, validating its effectiveness in selectively transferring spatial-temporal knowledge.
Abstract:Generative Flow Networks (GFlowNets) are probabilistic models predicated on Markov flows, employing specific amortization algorithms to learn stochastic policies that generate compositional substances including biomolecules, chemical materials, and more. Demonstrating formidable prowess in generating high-performance biochemical molecules, GFlowNets accelerate the discovery of scientific substances, effectively circumventing the time-consuming, labor-intensive, and costly shortcomings intrinsic to conventional material discovery. However, previous work often struggles to accumulate exploratory experience and is prone to becoming disoriented within expansive sampling spaces. Attempts to address this issue, such as LS-GFN, are limited to local greedy searches and lack broader global adjustments. This paper introduces a novel GFlowNets variant, the Dynamic Backtracking GFN (DB-GFN), which enhances the adaptability of decision-making steps through a reward-based dynamic backtracking mechanism. DB-GFN permits backtracking during the network construction process according to the current state's reward value, thus correcting disadvantageous decisions and exploring alternative pathways during the exploration process. Applied to generative tasks of biochemical molecules and genetic material sequences, DB-GFN surpasses existing GFlowNets models and traditional reinforcement learning methods in terms of sample quality, exploration sample quantity, and training convergence speed. Furthermore, the orthogonal nature of DB-GFN suggests its potential as a powerful tool for future improvements in GFlowNets, with the promise of integrating with other strategies to achieve more efficient search performance.
Abstract:With the digitization of modern cities, large data volumes and powerful computational resources facilitate the rapid update of intelligent models deployed in smart cities. Continual learning (CL) is a novel machine learning paradigm that constantly updates models to adapt to changing environments, where the learning tasks, data, and distributions can vary over time. Our survey provides a comprehensive review of continual learning methods that are widely used in smart city development. The content consists of three parts: 1) Methodology-wise. We categorize a large number of basic CL methods and advanced CL frameworks in combination with other learning paradigms including graph learning, spatial-temporal learning, multi-modal learning, and federated learning. 2) Application-wise. We present numerous CL applications covering transportation, environment, public health, safety, networks, and associated datasets related to urban computing. 3) Challenges. We discuss current problems and challenges and envision several promising research directions. We believe this survey can help relevant researchers quickly familiarize themselves with the current state of continual learning research used in smart city development and direct them to future research trends.
Abstract:This paper presents a novel framework for continual feature selection (CFS) in data preprocessing, particularly in the context of an open and dynamic environment where unknown classes may emerge. CFS encounters two primary challenges: the discovery of unknown knowledge and the transfer of known knowledge. To this end, the proposed CFS method combines the strengths of continual learning (CL) with granular-ball computing (GBC), which focuses on constructing a granular-ball knowledge base to detect unknown classes and facilitate the transfer of previously learned knowledge for further feature selection. CFS consists of two stages: initial learning and open learning. The former aims to establish an initial knowledge base through multi-granularity representation using granular-balls. The latter utilizes prior granular-ball knowledge to identify unknowns, updates the knowledge base for granular-ball knowledge transfer, reinforces old knowledge, and integrates new knowledge. Subsequently, we devise an optimal feature subset mechanism that incorporates minimal new features into the existing optimal subset, often yielding superior results during each period. Extensive experimental results on public benchmark datasets demonstrate our method's superiority in terms of both effectiveness and efficiency compared to state-of-the-art feature selection methods.
Abstract:Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text in images and answer questions related to the text content. Most existing methods heavily rely on the accuracy of Optical Character Recognition (OCR) systems, and aggressive fine-tuning based on limited spatial location information and erroneous OCR text information often leads to inevitable overfitting. In this paper, we propose a multimodal adversarial training architecture with spatial awareness capabilities. Specifically, we introduce an Adversarial OCR Enhancement (AOE) module, which leverages adversarial training in the embedding space of OCR modality to enhance fault-tolerant representation of OCR texts, thereby reducing noise caused by OCR errors. Simultaneously, We add a Spatial-Aware Self-Attention (SASA) mechanism to help the model better capture the spatial relationships among OCR tokens. Various experiments demonstrate that our method achieves significant performance improvements on both the ST-VQA and TextVQA datasets and provides a novel paradigm for multimodal adversarial training.