Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ang Li

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Oct 08, 2023

Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang

Figure 1 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Figure 2 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Figure 3 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Figure 4 for Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Abstract:Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods. Code is released at https://github.com/wengzejia1/Open-VCLIP.

* arXiv admin note: substantial text overlap with arXiv:2302.00624

Via

Access Paper or Ask Questions

FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Oct 06, 2023

Ziyao Wang, Jianyu Wang, Ang Li

Figure 1 for FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Figure 2 for FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Figure 3 for FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Figure 4 for FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

Abstract:The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.

Via

Access Paper or Ask Questions

FedNAR: Federated Optimization with Normalized Annealing Regularization

Oct 04, 2023

Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang

Figure 1 for FedNAR: Federated Optimization with Normalized Annealing Regularization

Figure 2 for FedNAR: Federated Optimization with Normalized Annealing Regularization

Figure 3 for FedNAR: Federated Optimization with Normalized Annealing Regularization

Figure 4 for FedNAR: Federated Optimization with Normalized Annealing Regularization

Abstract:Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.

* Thirty-seventh Conference on Neural Information Processing Systems

Via

Access Paper or Ask Questions

AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

Aug 31, 2023

Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu(+3 more)

$Figure 1 for AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction$

$Figure 2 for AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction$

$Figure 3 for AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction$

$Figure 4 for AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction$

Abstract:Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.

Via

Access Paper or Ask Questions

Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Aug 30, 2023

Yiran Wang, Yunsi Wen, Ang Li, Xiaoyan Hu, Christos Masouros

Figure 1 for Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Figure 2 for Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Figure 3 for Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Figure 4 for Block-Level Interference Exploitation Precoding for MU-MISO: An ADMM Approach

Abstract:We study constructive interference based block-level precoding (CI-BLP) in the downlink of multi-user multiple-input single-output (MU-MISO) systems. Specifically, our aim is to extend the analysis on CI-BLP to the case where the considered number of symbol slots is smaller than that of the users. To this end, we mathematically prove the feasibility of using the pseudo-inverse to obtain the optimal CI-BLP precoding matrix in a closed form. Similar to the case when the number of users is small, we show that a quadratic programming (QP) optimization on simplex can be constructed. We also design a low-complexity algorithm based on the alternating direction method of multipliers (ADMM) framework, which can efficiently solve large-scale QP problems. We further analyze the convergence and complexity of the proposed algorithm. Numerical results validate our analysis and the optimality of the proposed algorithm, and further show that the proposed algorithm offers a flexible performance-complexity tradeoff by limiting the maximum number of iterations, which motivates the use of CI-BLP in practical wireless systems.

Via

Access Paper or Ask Questions

AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

Aug 20, 2023

Hongwu Peng, Shaoyi Huang, Tong Zhou, Yukui Luo, Chenghong Wang, Zigeng Wang, Jiahui Zhao, Xi Xie, Ang Li, Tony Geng(+4 more)

Figure 1 for AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

Figure 2 for AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

Figure 3 for AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

Figure 4 for AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

Abstract:The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients' data privacy and security issues. Private inference (PI) techniques using cryptographic primitives offer a solution but often have high computation and communication costs, particularly with non-linear operators like ReLU. Many attempts to reduce ReLU operations exist, but they may need heuristic threshold selection or cause substantial accuracy loss. This work introduces AutoReP, a gradient-based approach to lessen non-linear operators and alleviate these issues. It automates the selection of ReLU and polynomial functions to speed up PI applications and introduces distribution-aware polynomial approximation (DaPa) to maintain model expressivity while accurately approximating ReLUs. Our experimental results demonstrate significant accuracy improvements of 6.12% (94.31%, 12.9K ReLU budget, CIFAR-10), 8.39% (74.92%, 12.9K ReLU budget, CIFAR-100), and 9.45% (63.69%, 55K ReLU budget, Tiny-ImageNet) over current state-of-the-art methods, e.g., SNL. Morever, AutoReP is applied to EfficientNet-B2 on ImageNet dataset, and achieved 75.55% accuracy with 176.1 times ReLU budget reduction.

* ICCV 2023 accepeted publication

Via

Access Paper or Ask Questions

Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Jul 27, 2023

Xiao Tong, Ang Li, Lei Lei, Fan Liu, Fuwang Dong

Figure 1 for Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Figure 2 for Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Figure 3 for Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Figure 4 for Symbol-Level Precoding for MU-MIMO System with RIRC Receiver

Abstract:Consider a multiuser multiple-input multiple-output (MU-MIMO) downlink system in which the base station (BS) sends multiple data streams to multi-antenna users via symbol-level precoding (SLP), where the optimization of receive combining matrix becomes crucial, unlike in the single-antenna user scenario. We begin by introducing a joint optimization problem on the symbol-level transmit precoder and receive combiner. The problem is solved using the alternating optimization (AO) method, and the optimal solution structures for transmit precoding and receive combining matrices are derived by using Lagrangian and Karush-Kuhn-Tucker (KKT) conditions, based on which, the original problem is transformed into an equivalent quadratic programming problem, enabling more efficient solutions. To address the challenge that the above joint design is difficult to implement, we propose a more practical scheme where the receive combining optimization is replaced by the interference rejection combiner (IRC), which is however difficult to directly use because of the rank-one transmit precoding matrix. Therefore, we introduce a new regularized IRC (RIRC) receiver to circumvent the above issue. Numerical results demonstrate that the practical SLP-RIRC method enjoys only a slight communication performance loss compared to the joint transmit precoding and receive combining design, both offering substantial performance gains over the conventional BD-based approaches.

* 13 pages, 10 figures

Via

Access Paper or Ask Questions

A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

Jul 19, 2023

Jinyang Li, Zhepeng Wang, Zhirui Hu, Prasanna Date, Ang Li, Weiwen Jiang

Figure 1 for A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

Figure 2 for A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

Figure 3 for A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

Figure 4 for A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

Abstract:Quantum computing presents a promising approach for machine learning with its capability for extremely parallel computation in high-dimension through superposition and entanglement. Despite its potential, existing quantum learning algorithms, such as Variational Quantum Circuits(VQCs), face challenges in handling more complex datasets, particularly those that are not linearly separable. What's more, it encounters the deployability issue, making the learning models suffer a drastic accuracy drop after deploying them to the actual quantum devices. To overcome these limitations, this paper proposes a novel spatial-temporal design, namely ST-VQC, to integrate non-linearity in quantum learning and improve the robustness of the learning model to noise. Specifically, ST-VQC can extract spatial features via a novel block-based encoding quantum sub-circuit coupled with a layer-wise computation quantum sub-circuit to enable temporal-wise deep learning. Additionally, a SWAP-Free physical circuit design is devised to improve robustness. These designs bring a number of hyperparameters. After a systematic analysis of the design space for each design component, an automated optimization framework is proposed to generate the ST-VQC quantum circuit. The proposed ST-VQC has been evaluated on two IBM quantum processors, ibm_cairo with 27 qubits and ibmq_lima with 7 qubits to assess its effectiveness. The results of the evaluation on the standard dataset for binary classification show that ST-VQC can achieve over 30% accuracy improvement compared with existing VQCs on actual quantum computers. Moreover, on a non-linear synthetic dataset, the ST-VQC outperforms a linear classifier by 27.9%, while the linear classifier using classical computing outperforms the existing VQC by 15.58%.

Via

Access Paper or Ask Questions

Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Jun 26, 2023

Zihan Liao, Fan Liu, Ang Li, Christos Masouros

Figure 1 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 2 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 3 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Figure 4 for Faster-Than-Nyquist Symbol-Level Precoding for Wideband Integrated Sensing and Communications

Abstract:In this paper, we present an innovative symbol-level precoding (SLP) approach for a wideband multi-user multi-input multi-output (MU-MIMO) downlink Integrated Sensing and Communications (ISAC) system employing faster-than-Nyquist (FTN) signaling. Our proposed technique minimizes the minimum mean squared error (MMSE) for the sensed parameter estimation while ensuring the communication per-user quality-of-service through the utilization of constructive interference (CI) methodologies. While the formulated problem is non-convex in general, we tackle this issue using proficient minorization and successive convex approximation (SCA) strategies. Numerical results substantiate that our FTN-ISAC-SLP framework significantly enhances communication throughput while preserving satisfactory sensing performance.

Via

Access Paper or Ask Questions

Style Transfer Enabled Sim2Real Framework for Efficient Learning of Robotic Ultrasound Image Analysis Using Simulated Data

May 16, 2023

Keyu Li, Xinyu Mao, Chengwei Ye, Ang Li, Yangxin Xu, Max Q. -H. Meng

Abstract:Robotic ultrasound (US) systems have shown great potential to make US examinations easier and more accurate. Recently, various machine learning techniques have been proposed to realize automatic US image interpretation for robotic US acquisition tasks. However, obtaining large amounts of real US imaging data for training is usually expensive or even unfeasible in some clinical applications. An alternative is to build a simulator to generate synthetic US data for training, but the differences between simulated and real US images may result in poor model performance. This work presents a Sim2Real framework to efficiently learn robotic US image analysis tasks based only on simulated data for real-world deployment. A style transfer module is proposed based on unsupervised contrastive learning and used as a preprocessing step to convert the real US images into the simulation style. Thereafter, a task-relevant model is designed to combine CNNs with vision transformers to generate the task-dependent prediction with improved generalization ability. We demonstrate the effectiveness of our method in an image regression task to predict the probe position based on US images in robotic transesophageal echocardiography (TEE). Our results show that using only simulated US data and a small amount of unlabelled real data for training, our method can achieve comparable performance to semi-supervised and fully supervised learning methods. Moreover, the effectiveness of our previously proposed CT-based US image simulation method is also indirectly confirmed.

Via

Access Paper or Ask Questions