Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

De-Chuan Zhan

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

May 21, 2024

Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan

Figure 1 for Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Figure 2 for Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Figure 3 for Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Figure 4 for Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Abstract:Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.

Via

Access Paper or Ask Questions

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

May 11, 2024

Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

Figure 1 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 2 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 3 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 4 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Abstract:Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.

Via

Access Paper or Ask Questions

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Apr 27, 2024

Chao Yi, Lu Ren, De-Chuan Zhan, Han-Jia Ye

Figure 1 for Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Figure 2 for Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Figure 3 for Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Figure 4 for Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Abstract:CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment between its pre-training objectives and feature extraction methods. This inconsistency can diminish the quality of the image's feature representation, adversely affecting CLIP's effectiveness in target tasks. In this paper, we view text features as precise neighbors of image features in CLIP's space and present a novel CrOss-moDal nEighbor Representation(CODER) based on the distance structure between images and their neighbor texts. This feature extraction method aligns better with CLIP's pre-training objectives, thereby fully leveraging CLIP's robust cross-modal capabilities. The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images. We introduce the Auto Text Generator(ATG) to automatically generate the required texts in a data-free and training-free manner. We apply CODER to CLIP's zero-shot and few-shot image classification tasks. Experiment results across various datasets and models confirm CODER's effectiveness. Code is available at:https://github.com/YCaigogogo/CVPR24-CODER.

Via

Access Paper or Ask Questions

SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Apr 22, 2024

Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan

Figure 1 for SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Figure 2 for SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Figure 3 for SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Figure 4 for SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

Abstract:Multivariate time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Recent studies have highlighted the advantages of channel independence to resist distribution drift but neglect channel correlations, limiting further enhancements. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations, but they either introduce excessive complexity or rely too heavily on the correlation to achieve satisfactory results under distribution drifts, particularly with a large number of channels. Addressing this gap, this paper presents an efficient MLP-based model, the Series-cOre Fused Time Series forecaster (SOFTS), which incorporates a novel STar Aggregate-Dispatch (STAD) module. Unlike traditional approaches that manage channel interactions through distributed structures, e.g., attention, STAD employs a centralized strategy. It aggregates all series to form a global core representation, which is then dispatched and fused with individual series representations to facilitate channel interactions effectively. SOFTS achieves superior performance over existing state-of-the-art methods with only linear complexity. The broad applicability of the STAD module across different forecasting models is also demonstrated empirically. For further research and development, we have made our code publicly available at https://github.com/Secilia-Cxy/SOFTS.

Via

Access Paper or Ask Questions

TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen

Apr 16, 2024

Da-Wei Zhou, Zhi-Hong Qi, Han-Jia Ye, De-Chuan Zhan

Abstract:The era of pre-trained models has ushered in a wealth of new insights for the machine learning community. Among the myriad of questions that arise, one of paramount importance is: 'Do pre-trained models possess comprehensive knowledge?' This paper seeks to address this crucial inquiry. In line with our objective, we have made publicly available a novel dataset comprised of images from TV series released post-2021. This dataset holds significant potential for use in various research areas, including the evaluation of incremental learning, novel class discovery, and long-tailed learning, among others. Project page: https://tv-100.github.io/

* Project page: https://tv-100.github.io/

Via

Access Paper or Ask Questions

MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

Apr 14, 2024

Xin-Chun Li, Shaoming Song, Yinchuan Li, Bingshuai Li, Yunfeng Shao, Yang Yang, De-Chuan Zhan

Figure 1 for MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

Figure 2 for MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

Figure 3 for MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

Figure 4 for MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

Abstract:In some real-world applications, data samples are usually distributed on local devices, where federated learning (FL) techniques are proposed to coordinate decentralized clients without directly sharing users' private data. FL commonly follows the parameter server architecture and contains multiple personalization and aggregation procedures. The natural data heterogeneity across clients, i.e., Non-I.I.D. data, challenges both the aggregation and personalization goals in FL. In this paper, we focus on a special kind of Non-I.I.D. scene where clients own incomplete classes, i.e., each client can only access a partial set of the whole class set. The server aims to aggregate a complete classification model that could generalize to all classes, while the clients are inclined to improve the performance of distinguishing their observed classes. For better model aggregation, we point out that the standard softmax will encounter several problems caused by missing classes and propose "restricted softmax" as an alternative. For better model personalization, we point out that the hard-won personalized models are not well exploited and propose "inherited private model" to store the personalization experience. Our proposed algorithm named MAP could simultaneously achieve the aggregation and personalization goals in FL. Abundant experimental studies verify the superiorities of our algorithm.

* Accepted by TKDE (11-Apr-2024)

Via

Access Paper or Ask Questions

SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Apr 04, 2024

Kaichen Huang, Minghao Shao, Shenghua Wan, Hai-Hang Sun, Shuai Feng, Le Gan, De-Chuan Zhan

Figure 1 for SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Figure 2 for SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Figure 3 for SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Figure 4 for SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Abstract:In many real-world visual Imitation Learning (IL) scenarios, there is a misalignment between the agent's and the expert's perspectives, which might lead to the failure of imitation. Previous methods have generally solved this problem by domain alignment, which incurs extra computation and storage costs, and these methods fail to handle the \textit{hard cases} where the viewpoint gap is too large. To alleviate the above problems, we introduce active sensoring in the visual IL setting and propose a model-based SENSory imitatOR (SENSOR) to automatically change the agent's perspective to match the expert's. SENSOR jointly learns a world model to capture the dynamics of latent states, a sensor policy to control the camera, and a motor policy to control the agent. Experiments on visual locomotion tasks show that SENSOR can efficiently simulate the expert's perspective and strategy, and outperforms most baseline methods.

Via

Access Paper or Ask Questions

DIDA: Denoised Imitation Learning based on Domain Adaptation

Apr 04, 2024

Kaichen Huang, Hai-Hang Sun, Shenghua Wan, Minghao Shao, Shuai Feng, Le Gan, De-Chuan Zhan

Figure 1 for DIDA: Denoised Imitation Learning based on Domain Adaptation

Figure 2 for DIDA: Denoised Imitation Learning based on Domain Adaptation

Figure 3 for DIDA: Denoised Imitation Learning based on Domain Adaptation

Figure 4 for DIDA: Denoised Imitation Learning based on Domain Adaptation

Abstract:Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.

Via

Access Paper or Ask Questions

Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Mar 20, 2024

Chao Yi, De-Chuan Zhan, Han-Jia Ye

Figure 1 for Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Figure 2 for Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Figure 3 for Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Figure 4 for Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Abstract:Vision Language Models (VLMs) excel in zero-shot image classification by pairing images with textual category names. The expanding variety of Pre-Trained VLMs enhances the likelihood of identifying a suitable VLM for specific tasks. Thus, a promising zero-shot image classification strategy is selecting the most appropriate Pre-Trained VLM from the VLM Zoo, relying solely on the text data of the target dataset without access to the dataset's images. In this paper, we analyze two inherent challenges in assessing the ability of a VLM in this Language-Only VLM selection: the "Modality Gap" -- the disparity in VLM's embeddings across two different modalities, making text a less reliable substitute for images; and the "Capability Gap" -- the discrepancy between the VLM's overall ranking and its ranking for target dataset, hindering direct prediction of a model's dataset-specific performance from its general performance. We propose VLM Selection With gAp Bridging (SWAB) to mitigate the negative impact of these two gaps. SWAB first adopts optimal transport to capture the relevance between open-source datasets and target dataset with a transportation matrix. It then uses this matrix to transfer useful statistics of VLMs from open-source datasets to the target dataset for bridging those two gaps and enhancing the VLM's capacity estimation for VLM selection. Experiments across various VLMs and image classification datasets validate SWAB's effectiveness.

Via

Access Paper or Ask Questions

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Mar 18, 2024

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, De-Chuan Zhan

Figure 1 for Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Figure 2 for Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Figure 3 for Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Figure 4 for Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning

Abstract:Class-Incremental Learning (CIL) requires a learning system to continually learn new classes without forgetting. Despite the strong performance of Pre-Trained Models (PTMs) in CIL, a critical issue persists: learning new classes often results in the overwriting of old ones. Excessive modification of the network causes forgetting, while minimal adjustments lead to an inadequate fit for new classes. As a result, it is desired to figure out a way of efficient model updating without harming former knowledge. In this paper, we propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL. To enable model updating without conflict, we train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces. These adapters span a high-dimensional feature space, enabling joint decision-making across multiple subspaces. As data evolves, the expanding subspaces render the old class classifiers incompatible with new-stage spaces. Correspondingly, we design a semantic-guided prototype complement strategy that synthesizes old classes' new features without using any old class instance. Extensive experiments on seven benchmark datasets verify EASE's state-of-the-art performance. Code is available at: https://github.com/sun-hailong/CVPR24-Ease

* Accepted to CVPR 2024. Code is available at: https://github.com/sun-hailong/CVPR24-Ease

Via

Access Paper or Ask Questions