Alert button
Picture for Honghai Liu

Honghai Liu

Alert button

A Simple Asymmetric Momentum Make SGD Greatest Again

Sep 05, 2023
Gongyue Zhang, Dinghuang Zhang, Shuwen Zhao, Donghan Liu, Carrie M. Toptan, Honghai Liu

Figure 1 for A Simple Asymmetric Momentum Make SGD Greatest Again
Figure 2 for A Simple Asymmetric Momentum Make SGD Greatest Again
Figure 3 for A Simple Asymmetric Momentum Make SGD Greatest Again
Figure 4 for A Simple Asymmetric Momentum Make SGD Greatest Again

We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM

Viaarxiv icon

Full Resolution Repetition Counting

May 24, 2023
Jianing Li, Bowen Chen, Zhiyong Wang, Honghai Liu

Figure 1 for Full Resolution Repetition Counting
Figure 2 for Full Resolution Repetition Counting
Figure 3 for Full Resolution Repetition Counting
Figure 4 for Full Resolution Repetition Counting

Given an untrimmed video, repetitive actions counting aims to estimate the number of repetitions of class-agnostic actions. To handle the various length of videos and repetitive actions, also optimization challenges in end-to-end video model training, down-sampling is commonly utilized in recent state-of-the-art methods, leading to ignorance of several repetitive samples. In this paper, we attempt to understand repetitive actions from a full temporal resolution view, by combining offline feature extraction and temporal convolution networks. The former step enables us to train repetition counting network without down-sampling while preserving all repetition regardless of the video length and action frequency, and the later network models all frames in a flexible and dynamically expanding temporal receptive field to retrieve all repetitions with a global aspect. We experimentally demonstrate that our method achieves better or comparable performance in three public datasets, i.e., TransRAC, UCFRep and QUVA. We expect this work will encourage our community to think about the importance of full temporal resolution.

* 12 pages and 4 figures and 17 conferences 
Viaarxiv icon

Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Mar 09, 2023
Junjie Hu, Chenyou Fan, Liguang Zhou, Qing Gao, Honghai Liu, Tin Lun Lam

Figure 1 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation
Figure 2 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation
Figure 3 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation
Figure 4 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

In recent years, monocular depth estimation (MDE) has gained significant progress in a data-driven learning fashion. Previous methods can infer depth maps for specific domains based on the paradigm of single-domain or joint-domain training with mixed data. However, they suffer from low scalability to new domains. In reality, target domains often dynamically change or increase, raising the requirement of incremental multi-domain/task learning. In this paper, we seek to enable lifelong learning for MDE, which performs cross-domain depth learning sequentially, to achieve high plasticity on a new domain and maintain good stability on original domains. To overcome significant domain gaps and enable scale-aware depth prediction, we design a lightweight multi-head framework that consists of a domain-shared encoder for feature extraction and domain-specific predictors for metric depth estimation. Moreover, given an input image, we propose an efficient predictor selection approach that automatically identifies the corresponding predictor for depth inference. Through extensive numerical studies, we show that the proposed method can achieve good efficiency, stability, and plasticity, leading the benchmarks by 8% to 15%.

Viaarxiv icon

CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking

Dec 12, 2022
Weihong Ren, Bowen Chen, Yuhang Shi, Weibo Jiang, Honghai Liu

Figure 1 for CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking
Figure 2 for CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking
Figure 3 for CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking
Figure 4 for CountingMOT: Joint Counting, Detection and Re-Identification for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 77.6), MOT17 (MOTA of 78.0%) and MOT20 (MOTA of 70.2%).

Viaarxiv icon

Deep Depth Completion: A Survey

May 17, 2022
Junjie Hu, Chenyu Bao, Mete Ozay, Chenyou Fan, Qing Gao, Honghai Liu, Tin Lun Lam

Figure 1 for Deep Depth Completion: A Survey
Figure 2 for Deep Depth Completion: A Survey
Figure 3 for Deep Depth Completion: A Survey
Figure 4 for Deep Depth Completion: A Survey

Depth completion aims at predicting dense pixel-wise depth from a sparse map captured from a depth sensor. It plays an essential role in various applications such as autonomous driving, 3D reconstruction, augmented reality, and robot navigation. Recent successes on the task have been demonstrated and dominated by deep learning based solutions. In this article, for the first time, we provide a comprehensive literature review that helps readers better grasp the research trends and clearly understand the current advances. We investigate the related studies from the design aspects of network architectures, loss functions, benchmark datasets, and learning strategies with a proposal of a novel taxonomy that categorizes existing methods. Besides, we present a quantitative comparison of model performance on two widely used benchmark datasets, including an indoor and an outdoor dataset. Finally, we discuss the challenges of prior works and provide readers with some insights for future research directions.

Viaarxiv icon

Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment

Apr 24, 2017
Shu Zhang, Hui Yu, Ting Wang, Junyu Dong, Honghai Liu

Figure 1 for Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment
Figure 2 for Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment
Figure 3 for Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment
Figure 4 for Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment

With the increasing demands of applications in virtual reality such as 3D films, virtual Human-Machine Interactions and virtual agents, the analysis of 3D human face analysis is considered to be more and more important as a fundamental step for those virtual reality tasks. Due to information provided by an additional dimension, 3D facial reconstruction enables aforementioned tasks to be achieved with higher accuracy than those based on 2D facial analysis. The denser the 3D facial model is, the more information it could provide. However, most existing dense 3D facial reconstruction methods require complicated processing and high system cost. To this end, this paper presents a novel method that simplifies the process of dense 3D facial reconstruction by employing only one frame of depth data obtained with an off-the-shelf RGB-D sensor. The experiments showed competitive results with real world data.

Viaarxiv icon

Ship Detection and Segmentation using Image Correlation

Oct 21, 2013
Alexander Kadyrov, Hui Yu, Honghai Liu

Figure 1 for Ship Detection and Segmentation using Image Correlation
Figure 2 for Ship Detection and Segmentation using Image Correlation
Figure 3 for Ship Detection and Segmentation using Image Correlation
Figure 4 for Ship Detection and Segmentation using Image Correlation

There have been intensive research interests in ship detection and segmentation due to high demands on a wide range of civil applications in the last two decades. However, existing approaches, which are mainly based on statistical properties of images, fail to detect smaller ships and boats. Specifically, known techniques are not robust enough in view of inevitable small geometric and photometric changes in images consisting of ships. In this paper a novel approach for ship detection is proposed based on correlation of maritime images. The idea comes from the observation that a fine pattern of the sea surface changes considerably from time to time whereas the ship appearance basically keeps unchanged. We want to examine whether the images have a common unaltered part, a ship in this case. To this end, we developed a method - Focused Correlation (FC) to achieve robustness to geometric distortions of the image content. Various experiments have been conducted to evaluate the effectiveness of the proposed approach.

* 8 pages, to be published in proc. of conference IEEE SMC 2013 
Viaarxiv icon