Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ce Zhang

MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Jun 29, 2023

Ce Zhang, Chengjie Zhang, Yiluan Guo, Lingji Chen, Michael Happold

Figure 1 for MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Figure 2 for MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Figure 3 for MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Figure 4 for MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Abstract:Multiple Object Tracking (MOT) is crucial to autonomous vehicle perception. End-to-end transformer-based algorithms, which detect and track objects simultaneously, show great potential for the MOT task. However, most existing methods focus on image-based tracking with a single object category. In this paper, we propose an end-to-end transformer-based MOT algorithm (MotionTrack) with multi-modality sensor inputs to track objects with multiple classes. Our objective is to establish a transformer baseline for the MOT in an autonomous driving environment. The proposed algorithm consists of a transformer-based data association (DA) module and a transformer-based query enhancement module to achieve MOT and Multiple Object Detection (MOD) simultaneously. The MotionTrack and its variations achieve better results (AMOTA score at 0.55) on the nuScenes dataset compared with other classical baseline models, such as the AB3DMOT, the CenterTrack, and the probabilistic 3D Kalman filter. In addition, we prove that a modified attention mechanism can be utilized for DA to accomplish the MOT, and aggregate history features to enhance the MOD performance.

* This paper is accepted by CVPR WAD 2023

Via

Access Paper or Ask Questions

SalienDet: A Saliency-based Feature Enhancement Algorithm for Object Detection for Autonomous Driving

May 11, 2023

Ning Ding, Ce Zhang, Azim Eskandarian

Abstract:Object detection (OD) is crucial to autonomous driving. Unknown objects are one of the reasons that hinder autonomous vehicles from driving beyond the operational domain. We propose a saliency-based OD algorithm (SalienDet) to detect objects that do not appear in the training sample set. SalienDet utilizes a saliency-based algorithm to enhance image features for object proposal generation. Then, we design a dataset relabeling approach to differentiate the unknown objects from all objects to achieve open-world detection. We evaluate SalienDet on KITTI, NuScenes, and BDD datasets, and the result indicates that it outperforms existing algorithms for unknown object detection. Additionally, SalienDet can be easily adapted for incremental learning in open-world detection tasks.

* Paper submitted to IEEE Transactions on Intelligent Vehicles

Via

Access Paper or Ask Questions

OpenBox: A Python Toolkit for Generalized Black-box Optimization

Apr 26, 2023

Huaijun Jiang, Yu Shen, Yang Li, Wentao Zhang, Ce Zhang, Bin Cui

Figure 1 for OpenBox: A Python Toolkit for Generalized Black-box Optimization

Figure 2 for OpenBox: A Python Toolkit for Generalized Black-box Optimization

Figure 3 for OpenBox: A Python Toolkit for Generalized Black-box Optimization

Abstract:Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly inferfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

Via

Access Paper or Ask Questions

Critical Sampling for Robust Evolution Operator Learning of Unknown Dynamical Systems

Apr 15, 2023

Ce Zhang, Kailiang Wu, Zhihai He

Abstract:Given an unknown dynamical system, what is the minimum number of samples needed for effective learning of its governing laws and accurate prediction of its future evolution behavior, and how to select these critical samples? In this work, we propose to explore this problem based on a design approach. Starting from a small initial set of samples, we adaptively discover critical samples to achieve increasingly accurate learning of the system evolution. One central challenge here is that we do not know the network modeling error since the ground-truth system state is unknown, which is however needed for critical sampling. To address this challenge, we introduce a multi-step reciprocal prediction network where forward and backward evolution networks are designed to learn the temporal evolution behavior in the forward and backward time directions, respectively. Very interestingly, we find that the desired network modeling error is highly correlated with the multi-step reciprocal prediction error, which can be directly computed from the current system state. This allows us to perform a dynamic selection of critical samples from regions with high network modeling errors for dynamical systems. Additionally, a joint spatial-temporal evolution network is introduced which incorporates spatial dynamics modeling into the temporal evolution prediction for robust learning of the system evolution operator with few samples. Our extensive experimental results demonstrate that our proposed method is able to dramatically reduce the number of samples needed for effective learning and accurate prediction of evolution behaviors of unknown dynamical systems by up to hundreds of times.

Via

Access Paper or Ask Questions

Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

Mar 25, 2023

Zhehan Kan, Shuoshuo Chen, Ce Zhang, Yushun Tang, Zhihai He

Abstract:A central challenge in human pose estimation, as well as in many other machine learning and prediction tasks, is the generalization problem. The learned network does not have the capability to characterize the prediction error, generate feedback information from the test sample, and correct the prediction error on the fly for each individual test sample, which results in degraded performance in generalization. In this work, we introduce a self-correctable and adaptable inference (SCAI) method to address the generalization challenge of network prediction and use human pose estimation as an example to demonstrate its effectiveness and performance. We learn a correction network to correct the prediction result conditioned by a fitness feedback error. This feedback error is generated by a learned fitness feedback network which maps the prediction result to the original input domain and compares it against the original input. Interestingly, we find that this self-referential feedback error is highly correlated with the actual prediction error. This strong correlation suggests that we can use this error as feedback to guide the correction process. It can be also used as a loss function to quickly adapt and optimize the correction network during the inference process. Our extensive experimental results on human pose estimation demonstrate that the proposed SCAI method is able to significantly improve the generalization capability and performance of human pose estimation.

* Accepted by CVPR 2023

Via

Access Paper or Ask Questions

High-throughput Generative Inference of Large Language Models with a Single GPU

Mar 13, 2023

Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez(+4 more)

Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU

Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU

Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU

Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU

Abstract:The high computational and memory requirements of large language model (LLM) inference traditionally make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generation engine for running LLMs with limited GPU memory. FlexGen can be flexibly configured under various hardware resource constraints by aggregating memory and computation from the GPU, CPU, and disk. Through a linear programming optimizer, it searches for efficient patterns to store and access tensors. FlexGen further compresses these weights and the attention cache to 4 bits with negligible accuracy loss. These techniques enable FlexGen to have a larger space of batch size choices and thus significantly increase maximum throughput. As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144. On the HELM benchmark, FlexGen can benchmark a 30B model with a 16GB GPU on 7 representative sub-scenarios in 21 hours. The code is available at https://github.com/FMInference/FlexGen

Via

Access Paper or Ask Questions

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Mar 10, 2023

Yushun Tang, Ce Zhang, Heng Xu, Shuoshuo Chen, Jie Cheng, Luziwei Leng, Qinghai Guo, Zhihai He

Figure 1 for Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Figure 2 for Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Figure 3 for Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Figure 4 for Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Abstract:Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. We take inspiration from the biological plausibility learning where the neuron responses are tuned based on a local synapse-change procedure and activated by competitive lateral inhibition rules. Based on these feed-forward learning rules, we design a soft Hebbian learning process which provides an unsupervised and effective mechanism for online adaptation. We observe that the performance of this feed-forward Hebbian learning for fully test-time adaptation can be significantly improved by incorporating a feedback neuro-modulation layer. It is able to fine-tune the neuron responses based on the external feedback generated by the error back-propagation from the top inference layers. This leads to our proposed neuro-modulated Hebbian learning (NHL) method for fully test-time adaptation. With the unsupervised feed-forward soft Hebbian learning being combined with a learned neuro-modulator to capture feedback from external responses, the source model can be effectively adapted during the testing process. Experimental results on benchmark datasets demonstrate that our proposed method can significantly improve the adaptation performance of network models and outperforms existing state-of-the-art methods.

* CVPR2023 accepted

Via

Access Paper or Ask Questions

Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

Feb 01, 2023

Susie Xi Rao, Peter H. Egger, Ce Zhang

Abstract:The scholarly publication space is growing steadily not just in numbers but also in complexity due to collaboration between individuals from within and across fields of research. This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set of fields (discipline-field-subfield). This system enables a holistic view about the interdependence of research activities in the mentioned hierarchical tiers in terms of knowledge production through articles and impact through citations. The classification system (44 disciplines - 738 fields - 1,501 subfields) utilizes and is able to cope with 160 million abstract snippets in Microsoft Academic Graph (Version 2018-05-17) using batch training in a modularized and distributed fashion to address and assess interdisciplinarity and inter-field classifications. In addition, we have explored multi-class classifications in both the single-label and multi-label settings. In total, we have conducted 3,140 experiments, in all models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers), the classification accuracy is > 90% in 77.84% and 78.83% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, as well as to capture the degree of interdisciplinarity in a publication which enables downstream analytics such as field interdisciplinarity. This system (a set of pretrained models) can serve as a backbone to an interactive system of indexing scientific publications.

* Under review

Via

Access Paper or Ask Questions

Convolution-enhanced Evolving Attention Networks

Dec 16, 2022

Yujing Wang, Yaming Yang, Zhuo Li, Jiangang Bai, Mingliang Zhang, Xiangtai Li, Jing Yu, Ce Zhang, Gao Huang, Yunhai Tong

Figure 1 for Convolution-enhanced Evolving Attention Networks

Figure 2 for Convolution-enhanced Evolving Attention Networks

Figure 3 for Convolution-enhanced Evolving Attention Networks

Figure 4 for Convolution-enhanced Evolving Attention Networks

Abstract:Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations, wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention

* Extension of the previous work (arXiv:2102.12895). arXiv admin note: text overlap with arXiv:2102.12895

Via

Access Paper or Ask Questions

Holistic Evaluation of Language Models

Nov 16, 2022

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar(+40 more)

Figure 1 for Holistic Evaluation of Language Models

Figure 2 for Holistic Evaluation of Language Models

Figure 3 for Holistic Evaluation of Language Models

Figure 4 for Holistic Evaluation of Language Models

Abstract:Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest for LMs. Then we select a broad subset based on coverage and feasibility, noting what's missing or underrepresented (e.g. question answering for neglected English dialects, metrics for trustworthiness). Second, we adopt a multi-metric approach: We measure 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency) for each of 16 core scenarios when possible (87.5% of the time). This ensures metrics beyond accuracy don't fall to the wayside, and that trade-offs are clearly exposed. We also perform 7 targeted evaluations, based on 26 targeted scenarios, to analyze specific aspects (e.g. reasoning, disinformation). Third, we conduct a large-scale evaluation of 30 prominent language models (spanning open, limited-access, and closed models) on all 42 scenarios, 21 of which were not previously used in mainstream LM evaluation. Prior to HELM, models on average were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: now all 30 models have been densely benchmarked on the same core scenarios and metrics under standardized conditions. Our evaluation surfaces 25 top-level findings. For full transparency, we release all raw model prompts and completions publicly for further analysis, as well as a general modular toolkit. We intend for HELM to be a living benchmark for the community, continuously updated with new scenarios, metrics, and models.

* Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

Via

Access Paper or Ask Questions