Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kun Yuan

Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

May 25, 2023
Yutong He, Xinmeng Huang, Kun Yuan

Figure 1 for Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Figure 2 for Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Figure 3 for Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Figure 4 for Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Communication compression is a common technique in distributed optimization that can alleviate communication overhead by transmitting compressed gradients and model parameters. However, compression can introduce information distortion, which slows down convergence and incurs more communication rounds to achieve desired solutions. Given the trade-off between lower per-round communication costs and additional rounds of communication, it is unclear whether communication compression reduces the total communication cost. This paper explores the conditions under which unbiased compression, a widely used form of compression, can reduce the total communication cost, as well as the extent to which it can do so. To this end, we present the first theoretical formulation for characterizing the total communication cost in distributed optimization with communication compression. We demonstrate that unbiased compression alone does not necessarily save the total communication cost, but this outcome can be achieved if the compressors used by all workers are further assumed independent. We establish lower bounds on the communication rounds required by algorithms using independent unbiased compressors to minimize smooth convex functions, and show that these lower bounds are tight by refining the analysis for ADIANA. Our results reveal that using independent unbiased compression can reduce the total communication cost by a factor of up to $\Theta(\sqrt{\min\{n, \kappa\}})$, where $n$ is the number of workers and $\kappa$ is the condition number of the functions being minimized. These theoretical findings are supported by experimental results.

Via

Access Paper or Ask Questions

Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

May 12, 2023
Yutong He, Xinmeng Huang, Yiming Chen, Wotao Yin, Kun Yuan

Figure 1 for Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Figure 2 for Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Figure 3 for Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Figure 4 for Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Communication compression is an essential strategy for alleviating communication overhead by reducing the volume of information exchanged between computing nodes in large-scale distributed stochastic optimization. Although numerous algorithms with convergence guarantees have been obtained, the optimal performance limit under communication compression remains unclear. In this paper, we investigate the performance limit of distributed stochastic optimization algorithms employing communication compression. We focus on two main types of compressors, unbiased and contractive, and address the best-possible convergence rates one can obtain with these compressors. We establish the lower bounds for the convergence rates of distributed stochastic optimization in six different settings, combining strongly-convex, generally-convex, or non-convex functions with unbiased or contractive compressor types. To bridge the gap between lower bounds and existing algorithms' rates, we propose NEOLITHIC, a nearly optimal algorithm with compression that achieves the established lower bounds up to logarithmic factors under mild conditions. Extensive experimental results support our theoretical findings. This work provides insights into the theoretical limitations of existing compressors and motivates further research into fundamentally new compressor properties.

Via

Access Paper or Ask Questions

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Apr 25, 2023
Yi-Fan Zhang, Xue Wang, Kexin Jin, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan

Figure 1 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 2 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 3 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Figure 4 for AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC.

* The Fortieth International Conference on Machine Learning, ICML, 2023
* 30 pages, 12 figures

Via

Access Paper or Ask Questions

Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Apr 13, 2023
Kai Zhao, Kun Yuan, Ming Sun, Xing Wen

Figure 1 for Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Figure 2 for Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Figure 3 for Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Figure 4 for Zoom-VQA: Patches, Frames and Clips Integration for Video Quality Assessment

Video quality assessment (VQA) aims to simulate the human perception of video quality, which is influenced by factors ranging from low-level color and texture details to high-level semantic content. To effectively model these complicated quality-related factors, in this paper, we decompose video into three levels (\ie, patch level, frame level, and clip level), and propose a novel Zoom-VQA architecture to perceive spatio-temporal features at different levels. It integrates three components: patch attention module, frame pyramid alignment, and clip ensemble strategy, respectively for capturing region-of-interest in the spatial dimension, multi-level information at different feature levels, and distortions distributed over the temporal dimension. Owing to the comprehensive design, Zoom-VQA obtains state-of-the-art results on four VQA benchmarks and achieves 2nd place in the NTIRE 2023 VQA challenge. Notably, Zoom-VQA has outperformed the previous best results on two subsets of LSVQ, achieving 0.8860 (+1.0%) and 0.7985 (+1.9%) of SRCC on the respective subsets. Adequate ablation studies further verify the effectiveness of each component. Codes and models are released in https://github.com/k-zha14/Zoom-VQA.

* Accepted by CVPR 2023 Workshop

Via

Access Paper or Ask Questions

BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

Mar 15, 2023
Lei Yang, Kaicheng Yu, Tao Tang, Jun Li, Kun Yuan, Li Wang, Xinyu Zhang, Peng Chen

Figure 1 for BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

Figure 2 for BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

Figure 3 for BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

Figure 4 for BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection

While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. The code is available at {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.

Via

Access Paper or Ask Questions

Quality-aware Pre-trained Models for Blind Image Quality Assessment

Mar 01, 2023
Kai Zhao, Kun Yuan, Ming Sun, Mading Li, Xing Wen

Figure 1 for Quality-aware Pre-trained Models for Blind Image Quality Assessment

Figure 2 for Quality-aware Pre-trained Models for Blind Image Quality Assessment

Figure 3 for Quality-aware Pre-trained Models for Blind Image Quality Assessment

Figure 4 for Quality-aware Pre-trained Models for Blind Image Quality Assessment

Blind image quality assessment (BIQA) aims to automatically evaluate the perceived quality of a single image, whose performance has been improved by deep learning-based methods in recent years. However, the paucity of labeled data somewhat restrains deep learning-based BIQA methods from unleashing their full potential. In this paper, we propose to solve the problem by a pretext task customized for BIQA in a self-supervised learning manner, which enables learning representations from orders of magnitude more data. To constrain the learning process, we propose a quality-aware contrastive loss based on a simple assumption: the quality of patches from a distorted image should be similar, but vary from patches from the same image with different degradations and patches from different images. Further, we improve the existing degradation process and form a degradation space with the size of roughly $2\times10^7$. After pre-trained on ImageNet using our method, models are more sensitive to image quality and perform significantly better on downstream BIQA tasks. Experimental results show that our method obtains remarkable improvements on popular BIQA datasets.

* Accepted by CVPR 2023

Via

Access Paper or Ask Questions

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Feb 13, 2023
Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai, Ziheng Wang, Guo Rui, Melanie Schellenberg, João L. Vilaça, Tobias Czempiel, Zhenkun Wang, Debdoot Sheet, Shrawan Kumar Thapa, Max Berniker, Patrick Godau, Pedro Morais, Sudarshan Regmi, Thuy Nuong Tran, Jaime Fonseca, Jan-Hinrich Nölke, Estevão Lima, Eduard Vazquez, Lena Maier-Hein, Nassir Navab, Pietro Mascagni, Barbara Seeliger, Cristians Gonzalez, Didier Mutter, Nicolas Padoy

Figure 1 for CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Figure 2 for CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Figure 3 for CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Figure 4 for CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results, their significance, and useful insights for future research directions and applications in surgery.

* MICCAI EndoVis CholecTriplet2022 challenge report. Submitted to journal of Medical Image Analysis. 22 pages, 14 figures, 6 tables

Via

Access Paper or Ask Questions

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Nov 01, 2022
Xinmeng Huang, Kun Yuan

Figure 1 for Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Figure 2 for Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Figure 3 for Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are moving. Federated learning can also be regarded as decentralized optimization with time-varying communication patterns alternating between global averaging and local updates. While numerous studies exist to clarify its theoretical limits and develop efficient algorithms, it remains unclear what the optimal complexity is for non-convex decentralized stochastic optimization over time-varying networks. The main difficulties lie in how to gauge the effectiveness when transmitting messages between two nodes via time-varying communications, and how to establish the lower bound when the network size is fixed (which is a prerequisite in stochastic optimization). This paper resolves these challenges and establish the first lower bound complexity. We also develop a new decentralized algorithm to nearly attain the lower bound, showing the tightness of the lower bound and the optimality of our algorithm.

* Accepted by 14th Annual Workshop on Optimization for Machine Learning. arXiv admin note: text overlap with arXiv:2210.07863

Via

Access Paper or Ask Questions

Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Oct 14, 2022
Zhuoqing Song, Weijian Li, Kexin Jin, Lei Shi, Ming Yan, Wotao Yin, Kun Yuan

Figure 1 for Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Figure 2 for Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Figure 3 for Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Figure 4 for Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate

Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is ``mixed'' by communication. We found that popular communication topologies either have large maximum degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate that is used to measure the mixing efficiency. In the proposed family, EquiStatic has a degree of $\Theta(\ln(n))$, where $n$ is the network size, and a series of time-dependent one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve an $n$-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, theoretically and empirically. Our code is implemented through BlueFog and available at \url{https://github.com/kexinjinnn/EquiTopo}

* NeurIPS 2022

Via

Access Paper or Ask Questions

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Oct 14, 2022
Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan

Figure 1 for Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Figure 2 for Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Figure 3 for Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Figure 4 for Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

Decentralized optimization is effective to save communication in large-scale machine learning. Although numerous algorithms have been proposed with theoretical guarantees and empirical successes, the performance limits in decentralized optimization, especially the influence of network topology and its associated weight matrix on the optimal convergence rate, have not been fully understood. While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear. This paper revisits non-convex stochastic decentralized optimization and establishes an optimal convergence rate with general weight matrices. In addition, we also establish the optimal rate when non-convex loss functions further satisfy the Polyak-Lojasiewicz (PL) condition. Following existing lines of analysis in literature cannot achieve these results. Instead, we leverage the Ring-Lattice graph to admit general weight matrices while maintaining the optimal relation between the graph diameter and weight matrix connectivity. Lastly, we develop a new decentralized algorithm to nearly attain the above two optimal rates under additional mild conditions.

Via

Access Paper or Ask Questions