Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuting Yang

Towards Efficient Verification of Quantized Neural Networks

Dec 27, 2023

Pei Huang, Haoze Wu, Yuting Yang, Ieva Daukantas, Min Wu, Yedi Zhang, Clark Barrett

Figure 1 for Towards Efficient Verification of Quantized Neural Networks

Figure 2 for Towards Efficient Verification of Quantized Neural Networks

Figure 3 for Towards Efficient Verification of Quantized Neural Networks

Figure 4 for Towards Efficient Verification of Quantized Neural Networks

Abstract:Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then show how efficiency can be improved by utilizing gradient-based heuristic search methods and also bound-propagation techniques. We evaluate our approach on perception networks quantized with PyTorch. Our results show that we can verify quantized networks with better scalability and efficiency than the previous state of the art.

* This paper has been accepted by AAAI2024

Via

Access Paper or Ask Questions

Cascade: A Platform for Delay-Sensitive Edge Intelligence

Nov 29, 2023

Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, Ken Birman

Figure 1 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 2 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 3 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Figure 4 for Cascade: A Platform for Delay-Sensitive Edge Intelligence

Abstract:Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management. Yet many intelligent applications run on AI/ML platforms that optimize for high throughput even at the cost of high tail-latency. Cascade is a new AI/ML hosting platform intended to untangle this puzzle. Innovations include a legacy-friendly storage layer that moves data with minimal copying and a "fast path" that collocates data and computation to maximize responsiveness. Our evaluation shows that Cascade reduces latency by orders of magnitude with no loss of throughput.

* 14 pages, 12 Figures

Via

Access Paper or Ask Questions

LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Oct 07, 2023

Guodong Ma, Wenxuan Wang, Yuke Li, Yuting Yang, Binbin Du, Haoran Fu

Figure 1 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Figure 2 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Figure 3 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Figure 4 for LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR

Abstract:Recently, to mitigate the confusion between different languages in code-switching (CS) automatic speech recognition (ASR), the conditionally factorized models, such as the language-aware encoder (LAE), explicitly disregard the contextual information between different languages. However, this information may be helpful for ASR modeling. To alleviate this issue, we propose the LAE-ST-MoE framework. It incorporates speech translation (ST) tasks into LAE and utilizes ST to learn the contextual information between different languages. It introduces a task-based mixture of expert modules, employing separate feed-forward networks for the ASR and ST tasks. Experimental results on the ASRU 2019 Mandarin-English CS challenge dataset demonstrate that, compared to the LAE-based CTC, the LAE-ST-MoE model achieves a 9.26% mix error reduction on the CS test with the same decoding parameter. Moreover, the well-trained LAE-ST-MoE model can perform ST tasks from CS speech to Mandarin or English text.

* Accepted to IEEE ASRU 2023

Via

Access Paper or Ask Questions

The Robust Semantic Segmentation UNCV2023 Challenge Results

Sep 27, 2023

Xuanlong Yu, Yi Zuo, Zitao Wang, Xiaowen Zhang, Jiaxuan Zhao, Yuting Yang, Licheng Jiao, Rui Peng, Xinyi Wang, Junpei Zhang(+27 more)

Figure 1 for The Robust Semantic Segmentation UNCV2023 Challenge Results

Figure 2 for The Robust Semantic Segmentation UNCV2023 Challenge Results

Figure 3 for The Robust Semantic Segmentation UNCV2023 Challenge Results

Figure 4 for The Robust Semantic Segmentation UNCV2023 Challenge Results

Abstract:This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.

* 11 pages, 4 figures, accepted at ICCV 2023 UNCV workshop

Via

Access Paper or Ask Questions

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

Jun 01, 2023

Yuting Yang, Yuke Li, Binbin Du

Figure 1 for Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

Figure 2 for Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

Figure 3 for Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

Figure 4 for Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

Abstract:The unified streaming and non-streaming speech recognition model has achieved great success due to its comprehensive capabilities. In this paper, we propose to improve the accuracy of the unified model by bridging the inherent representation gap between the streaming and non-streaming modes with a contrastive objective. Specifically, the top-layer hidden representation at the same frame of the streaming and non-streaming modes are regarded as a positive pair, encouraging the representation of the streaming mode close to its non-streaming counterpart. The multiple negative samples are randomly selected from the rest frames of the same sample under the non-streaming mode. Experimental results demonstrate that the proposed method achieves consistent improvements toward the unified model in both streaming and non-streaming modes. Our method achieves CER of 4.66% in the streaming mode and CER of 4.31% in the non-streaming mode, which sets a new state-of-the-art on the AISHELL-1 benchmark.

* Accepted by INTERSPEECH 2023

Via

Access Paper or Ask Questions

FreConv: Frequency Branch-and-Integration Convolutional Networks

Apr 10, 2023

Zhaowen Li, Xu Zhao, Peigeng Ding, Zongxin Gao, Yuting Yang, Ming Tang, Jinqiao Wang

Figure 1 for FreConv: Frequency Branch-and-Integration Convolutional Networks

Figure 2 for FreConv: Frequency Branch-and-Integration Convolutional Networks

Figure 3 for FreConv: Frequency Branch-and-Integration Convolutional Networks

Figure 4 for FreConv: Frequency Branch-and-Integration Convolutional Networks

Abstract:Recent researches indicate that utilizing the frequency information of input data can enhance the performance of networks. However, the existing popular convolutional structure is not designed specifically for utilizing the frequency information contained in datasets. In this paper, we propose a novel and effective module, named FreConv (frequency branch-and-integration convolution), to replace the vanilla convolution. FreConv adopts a dual-branch architecture to extract and integrate high- and low-frequency information. In the high-frequency branch, a derivative-filter-like architecture is designed to extract the high-frequency information while a light extractor is employed in the low-frequency branch because the low-frequency information is usually redundant. FreConv is able to exploit the frequency information of input data in a more reasonable way to enhance feature representation ability and reduce the memory and computational cost significantly. Without any bells and whistles, experimental results on various tasks demonstrate that FreConv-equipped networks consistently outperform state-of-the-art baselines.

* Accepted by ICME2023

Via

Access Paper or Ask Questions

HUSP-SP: Faster Utility Mining on Sequence Data

Dec 29, 2022

Chunkai Zhang, Yuting Yang, Zilin Du, Wensheng Gan, Philip S. Yu

Abstract:High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.

* ACM TKDD, 7 figures, 2 tables

Via

Access Paper or Ask Questions

Improving CTC-based ASR Models with Gated Interlayer Collaboration

May 25, 2022

Yuting Yang, Yuke Li, Binbin Du

Figure 1 for Improving CTC-based ASR Models with Gated Interlayer Collaboration

Figure 2 for Improving CTC-based ASR Models with Gated Interlayer Collaboration

Figure 3 for Improving CTC-based ASR Models with Gated Interlayer Collaboration

Figure 4 for Improving CTC-based ASR Models with Gated Interlayer Collaboration

Abstract:For Automatic Speech Recognition (ASR), the CTC-based methods have become a dominant paradigm due to its simple architecture and efficient non-autoregressive inference manner. However, these methods without external language models usually lack the capacity of modeling the conditional dependencies and the textual interaction. In this work, we present a Gated Interlayer Collaboration (GIC) mechanism which introduces the contextual information into the models and relaxes the conditional independence assumption of the CTC-based models. Specifically, we train the model with intermediate CTC losses calculated by the interlayer outputs of the model, in which the probability distributions of the intermediate layers naturally serve as soft label sequences. The GIC block consists of an embedding layer to obtain the textual embedding of the soft label at each position, and a gate unit to fuse the textual embedding and the acoustic features. Experiments on AISHELL-1 and AIDATATANG benchmarks show that the proposed method outperforms the recently published CTC-based ASR models. Specifically, our method achieves CER of 4.0%/4.4% on AISHELL-1 dev/test sets and CER of 3.8%/4.4% on AIDATATANG dev/test sets using CTC greedy search decoding without external language models.

* Submitted to INTERSPEECH2022

Via

Access Paper or Ask Questions

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

May 25, 2022

Yuting Yang, Binbin Du, Yuke Li

Figure 1 for Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

Figure 2 for Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

Figure 3 for Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

Figure 4 for Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

Abstract:The choice of modeling units affects the performance of the acoustic modeling and plays an important role in automatic speech recognition (ASR). In mandarin scenarios, the Chinese characters represent meaning but are not directly related to the pronunciation. Thus only considering the writing of Chinese characters as modeling units is insufficient to capture speech features. In this paper, we present a novel method involves with multi-level modeling units, which integrates multi-level information for mandarin speech recognition. Specifically, the encoder block considers syllables as modeling units, and the decoder block deals with character modeling units. During inference, the input feature sequences are converted into syllable sequences by the encoder block and then converted into Chinese characters by the decoder block. This process is conducted by a unified end-to-end model without introducing additional conversion models. By introducing InterCE auxiliary task, our method achieves competitive results with CER of 4.1%/4.6% and 4.6%/5.2% on the widely used AISHELL-1 benchmark without a language model, using the Conformer and the Transformer backbones respectively.

* Submitted to INTERSPEECH2022

Via

Access Paper or Ask Questions

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Mar 24, 2022

Yuting Yang, Licheng Jiao, Xu Liu, Fang Liu, Shuyuan Yang, Zhixi Feng, Xu Tang

Figure 1 for Transformers Meet Visual Learning Understanding: A Comprehensive Review

Figure 2 for Transformers Meet Visual Learning Understanding: A Comprehensive Review

Figure 3 for Transformers Meet Visual Learning Understanding: A Comprehensive Review

Figure 4 for Transformers Meet Visual Learning Understanding: A Comprehensive Review

Abstract:Dynamic attention mechanism and global modeling ability make Transformer show strong feature learning ability. In recent years, Transformer has become comparable to CNNs methods in computer vision. This review mainly investigates the current research progress of Transformer in image and video applications, which makes a comprehensive overview of Transformer in visual learning understanding. First, the attention mechanism is reviewed, which plays an essential part in Transformer. And then, the visual Transformer model and the principle of each module are introduced. Thirdly, the existing Transformer-based models are investigated, and their performance is compared in visual learning understanding applications. Three image tasks and two video tasks of computer vision are investigated. The former mainly includes image classification, object detection, and image segmentation. The latter contains object tracking and video classification. It is significant for comparing different models' performance in various tasks on several public benchmark data sets. Finally, ten general problems are summarized, and the developing prospects of the visual Transformer are given in this review.

* arXiv admin note: text overlap with arXiv:2010.11929, arXiv:1706.03762 by other authors

Via

Access Paper or Ask Questions