Alert button
Picture for Zhiqiang Ma

Zhiqiang Ma

Alert button

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Jun 27, 2023
Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, Ming'en Zhao

Figure 1 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 2 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 3 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation
Figure 4 for Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Transducer is one of the mainstream frameworks for streaming speech recognition. There is a performance gap between the streaming and non-streaming transducer models due to limited context. To reduce this gap, an effective way is to ensure that their hidden and output distributions are consistent, which can be achieved by hierarchical knowledge distillation. However, it is difficult to ensure the distribution consistency simultaneously because the learning of the output distribution depends on the hidden one. In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning. In the former stage, we learn hidden representation with full context by applying mean square error loss function. In the latter stage, we design a power transformation based adaptive smoothness method to learn stable output distribution. It achieved 19\% relative reduction in word error rate, and a faster response for the first token compared with the original streaming model in LibriSpeech corpus.

Viaarxiv icon

Risk-Aware Reward Shaping of Reinforcement Learning Agents for Autonomous Driving

Jun 05, 2023
Lin-Chi Wu, Zengjie Zhang, Sofie Haesaert, Zhiqiang Ma, Zhiyong Sun

Reinforcement learning (RL) is an effective approach to motion planning in autonomous driving, where an optimal driving policy can be automatically learned using the interaction data with the environment. Nevertheless, the reward function for an RL agent, which is significant to its performance, is challenging to be determined. The conventional work mainly focuses on rewarding safe driving states but does not incorporate the awareness of risky driving behaviors of the vehicles. In this paper, we investigate how to use risk-aware reward shaping to leverage the training and test performance of RL agents in autonomous driving. Based on the essential requirements that prescribe the safety specifications for general autonomous driving in practice, we propose additional reshaped reward terms that encourage exploration and penalize risky driving behaviors. A simulation study in OpenAI Gym indicates the advantage of risk-aware reward shaping for various RL agents. Also, we point out that proximal policy optimization (PPO) is likely to be the best RL method that works with risk-aware reward shaping.

Viaarxiv icon

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

May 10, 2023
Xianzhi Li, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

Figure 1 for Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Figure 2 for Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Figure 3 for Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks
Figure 4 for Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? An Examination on Several Typical Tasks

The most recent large language models such as ChatGPT and GPT-4 have garnered significant attention, as they are capable of generating high-quality responses to human input. Despite the extensive testing of ChatGPT and GPT-4 on generic text corpora, showcasing their impressive capabilities, a study focusing on financial corpora has not been conducted. In this study, we aim to bridge this gap by examining the potential of ChatGPT and GPT-4 as a solver for typical financial text analytic problems in the zero-shot or few-shot setting. Specifically, we assess their capabilities on four representative tasks over five distinct financial textual datasets. The preliminary study shows that ChatGPT and GPT-4 struggle on tasks such as financial named entity recognition (NER) and sentiment analysis, where domain-specific knowledge is required, while they excel in numerical reasoning tasks. We report both the strengths and limitations of the current versions of ChatGPT and GPT-4, comparing them to the state-of-the-art finetuned models as well as pretrained domain-specific generative models. Our experiments provide qualitative studies, through which we hope to help understand the capability of the existing models and facilitate further improvements.

* 9 pages, 5 figures 
Viaarxiv icon

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

Oct 07, 2022
Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

Figure 1 for ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
Figure 2 for ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
Figure 3 for ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
Figure 4 for ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching. The community is experiencing the shift of the challenge from how to model language to the imitation of complex reasoning abilities like human beings. In this work, we investigate the application domain of finance that involves real-world, complex numerical reasoning. We propose a new large-scale dataset, ConvFinQA, aiming to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations. We conduct comprehensive experiments and analyses with both the neural symbolic methods and the prompting-based methods, to provide insights into the reasoning mechanisms of these two divisions. We believe our new dataset should serve as a valuable resource to push forward the exploration of real-world, complex reasoning tasks as the next research focus. Our dataset and code is publicly available at https://github.com/czyssrs/ConvFinQA.

* EMNLP 2022 
Viaarxiv icon

A Novel Generative Convolutional Neural Network for Robot Grasp Detection on Gaussian Guidance

May 09, 2022
Yuanhao Li, Yu Liu, Zhiqiang Ma, Panfeng Huang

Figure 1 for A Novel Generative Convolutional Neural Network for Robot Grasp Detection on Gaussian Guidance
Figure 2 for A Novel Generative Convolutional Neural Network for Robot Grasp Detection on Gaussian Guidance
Figure 3 for A Novel Generative Convolutional Neural Network for Robot Grasp Detection on Gaussian Guidance
Figure 4 for A Novel Generative Convolutional Neural Network for Robot Grasp Detection on Gaussian Guidance

The vision-based grasp detection method is an important research direction in the field of robotics. However, due to the rectangle metric of the grasp detection rectangle's limitation, a false-positive grasp occurs, resulting in the failure of the real-world robot grasp task. In this paper, we propose a novel generative convolutional neural network model to improve the accuracy and robustness of robot grasp detection in real-world scenes. First, a Gaussian-based guided training method is used to encode the quality of the grasp point and grasp angle in the grasp pose, highlighting the highest-quality grasp point position and grasp angle and reducing the generation of false-positive grasps. Simultaneously, deformable convolution is used to obtain the shape features of the object in order to guide the subsequent network to the position. Furthermore, a global-local feature fusion method is introduced in order to efficiently obtain finer features during the feature reconstruction stage, allowing the network to focus on the features of the grasped objects. On the Cornell Grasping Datasets and Jacquard Datasets, our method achieves excellent performance of 99.0$\%$ and 95.9$\%$, respectively. Finally, the proposed method is put to the test in a real-world robot grasping scenario.

Viaarxiv icon

Towards Earnings Call and Stock Price Movement

Aug 23, 2020
Zhiqiang Ma, Grace Bang, Chong Wang, Xiaomo Liu

Earnings calls are hosted by management of public companies to discuss the company's financial performance with analysts and investors. Information disclosed during an earnings call is an essential source of data for analysts and investors to make investment decisions. Thus, we leverage earnings call transcripts to predict future stock price dynamics. We propose to model the language in transcripts using a deep learning framework, where an attention mechanism is applied to encode the text data into vectors for the discriminative network classifier to predict stock price movements. Our empirical experiments show that the proposed model is superior to the traditional machine learning baselines and earnings call information can boost the stock price prediction performance.

* Accepted by KDD 2020 MLF workshop 
Viaarxiv icon

SPot: A tool for identifying operating segments in financial tables

May 17, 2020
Zhiqiang Ma, Steven Pomerville, Mingyang Di, Armineh Nourbakhsh

Figure 1 for SPot: A tool for identifying operating segments in financial tables
Figure 2 for SPot: A tool for identifying operating segments in financial tables
Figure 3 for SPot: A tool for identifying operating segments in financial tables
Figure 4 for SPot: A tool for identifying operating segments in financial tables

In this paper we present SPot, an automated tool for detecting operating segments and their related performance indicators from earnings reports. Due to their company-specific nature, operating segments cannot be detected using taxonomy-based approaches. Instead, we train a Bidirectional RNN classifier that can distinguish between common metrics such as "revenue" and company-specific metrics that are likely to be operating segments, such as "iPhone" or "cloud services". SPot surfaces the results in an interactive web interface that allows users to trace and adjust performance metrics for each operating segment. This facilitates credit monitoring, enables them to perform competitive benchmarking more effectively, and can be used for trend analysis at company and sector levels.

* This manuscript has been reviewed and accepted by SIGIR 2020 
Viaarxiv icon

Empirical Study on Detecting Controversy in Social Media

Aug 25, 2019
Azadeh Nematzadeh, Grace Bang, Xiaomo Liu, Zhiqiang Ma

Figure 1 for Empirical Study on Detecting Controversy in Social Media
Figure 2 for Empirical Study on Detecting Controversy in Social Media
Figure 3 for Empirical Study on Detecting Controversy in Social Media

Companies and financial investors are paying increasing attention to social consciousness in developing their corporate strategies and making investment decisions to support a sustainable economy for the future. Public discussion on incidents and events--controversies --of companies can provide valuable insights on how well the company operates with regards to social consciousness and indicate the company's overall operational capability. However, there are challenges in evaluating the degree of a company's social consciousness and environmental sustainability due to the lack of systematic data. We introduce a system that utilizes Twitter data to detect and monitor controversial events and show their impact on market volatility. In our study, controversial events are identified from clustered tweets that share the same 5W terms and sentiment polarities of these clusters. Credible news links inside the event tweets are used to validate the truth of the event. A case study on the Starbucks Philadelphia arrests shows that this method can provide the desired functionality.

* The work is accepted by the 2nd KDD Workshop on Anomaly Detection in Finance, 2019. The authors contributed equally to this work, listed in the alphabetical order 
Viaarxiv icon

The USTC-NEL Speech Translation system at IWSLT 2018

Dec 06, 2018
Dan Liu, Junhua Liu, Wu Guo, Shifu Xiong, Zhiqiang Ma, Rui Song, Chongliang Wu, Quan Liu

Figure 1 for The USTC-NEL Speech Translation system at IWSLT 2018
Figure 2 for The USTC-NEL Speech Translation system at IWSLT 2018
Figure 3 for The USTC-NEL Speech Translation system at IWSLT 2018
Figure 4 for The USTC-NEL Speech Translation system at IWSLT 2018

This paper describes the USTC-NEL system to the speech translation task of the IWSLT Evaluation 2018. The system is a conventional pipeline system which contains 3 modules: speech recognition, post-processing and machine translation. We train a group of hybrid-HMM models for our speech recognition, and for machine translation we train transformer based neural machine translation models with speech recognition output style text as input. Experiments conducted on the IWSLT 2018 task indicate that, compared to baseline system from KIT, our system achieved 14.9 BLEU improvement.

* 5 pages, 8 tabels 
Viaarxiv icon