Alert button
Picture for Li Zhao

Li Zhao

Alert button

Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Oct 06, 2023
Yan Zhao, Yuan Zong, Jincen Wang, Hailun Lian, Cheng Lu, Li Zhao, Wenming Zheng

Figure 1 for Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 2 for Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 3 for Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 4 for Layer-Adapted Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

In this paper, we propose a new unsupervised domain adaptation (DA) method called layer-adapted implicit distribution alignment networks (LIDAN) to address the challenge of cross-corpus speech emotion recognition (SER). LIDAN extends our previous ICASSP work, deep implicit distribution alignment networks (DIDAN), whose key contribution lies in the introduction of a novel regularization term called implicit distribution alignment (IDA). This term allows DIDAN trained on source (training) speech samples to remain applicable to predicting emotion labels for target (testing) speech samples, regardless of corpus variance in cross-corpus SER. To further enhance this method, we extend IDA to layer-adapted IDA (LIDA), resulting in LIDAN. This layer-adpated extention consists of three modified IDA terms that consider emotion labels at different levels of granularity. These terms are strategically arranged within different fully connected layers in LIDAN, aligning with the increasing emotion-discriminative abilities with respect to the layer depth. This arrangement enables LIDAN to more effectively learn emotion-discriminative and corpus-invariant features for SER across various corpora compared to DIDAN. It is also worthy to mention that unlike most existing methods that rely on estimating statistical moments to describe pre-assumed explicit distributions, both IDA and LIDA take a different approach. They utilize an idea of target sample reconstruction to directly bridge the feature distribution gap without making assumptions about their distribution type. As a result, DIDAN and LIDAN can be viewed as implicit cross-corpus SER methods. To evaluate LIDAN, we conducted extensive cross-corpus SER experiments on EmoDB, eNTERFACE, and CASIA corpora. The experimental results demonstrate that LIDAN surpasses recent state-of-the-art explicit unsupervised DA methods in tackling cross-corpus SER tasks.

Viaarxiv icon

Pre-Trained Large Language Models for Industrial Control

Aug 06, 2023
Lei Song, Chuheng Zhang, Li Zhao, Jiang Bian

For industrial control, developing high-performance controllers with few samples and low technical debt is appealing. Foundation models, possessing rich prior knowledge obtained from pre-training with Internet-scale corpus, have the potential to be a good controller with proper prompts. In this paper, we take HVAC (Heating, Ventilation, and Air Conditioning) building control as an example to examine the ability of GPT-4 (one of the first-tier foundation models) as the controller. To control HVAC, we wrap the task as a language game by providing text including a short description for the task, several selected demonstrations, and the current observation to GPT-4 on each step and execute the actions responded by GPT-4. We conduct series of experiments to answer the following questions: 1)~How well can GPT-4 control HVAC? 2)~How well can GPT-4 generalize to different scenarios for HVAC control? 3) How different parts of the text context affect the performance? In general, we found GPT-4 achieves the performance comparable to RL methods with few samples and low technical debt, indicating the potential of directly applying foundation models to industrial control tasks.

Viaarxiv icon

Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Jul 06, 2023
Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu

Figure 1 for Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance
Figure 2 for Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance
Figure 3 for Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance
Figure 4 for Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance

Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.

* Accepted in KDD 2023; The website is at https://seqml.github.io/marl4fin 
Viaarxiv icon

A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management

Jun 13, 2023
Xianliang Yang, Zhihao Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Jiang Bian

Figure 1 for A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management
Figure 2 for A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management
Figure 3 for A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management
Figure 4 for A Versatile Multi-Agent Reinforcement Learning Benchmark for Inventory Management

Multi-agent reinforcement learning (MARL) models multiple agents that interact and learn within a shared environment. This paradigm is applicable to various industrial scenarios such as autonomous driving, quantitative trading, and inventory management. However, applying MARL to these real-world scenarios is impeded by many challenges such as scaling up, complex agent interactions, and non-stationary dynamics. To incentivize the research of MARL on these challenges, we develop MABIM (Multi-Agent Benchmark for Inventory Management) which is a multi-echelon, multi-commodity inventory management simulator that can generate versatile tasks with these different challenging properties. Based on MABIM, we evaluate the performance of classic operations research (OR) methods and popular MARL algorithms on these challenging tasks to highlight their weaknesses and potential.

Viaarxiv icon

Asking Before Action: Gather Information in Embodied Decision Making with Language Models

May 25, 2023
Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen

Figure 1 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 2 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 3 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Figure 4 for Asking Before Action: Gather Information in Embodied Decision Making with Language Models

With strong capabilities of reasoning and a generic understanding of the world, Large Language Models (LLMs) have shown great potential in building versatile embodied decision making agents capable of performing diverse tasks. However, when deployed to unfamiliar environments, we show that LLM agents face challenges in efficiently gathering necessary information, leading to suboptimal performance. On the other hand, in unfamiliar scenarios, human individuals often seek additional information from their peers before taking action, leveraging external knowledge to avoid unnecessary trial and error. Building upon this intuition, we propose \textit{Asking Before Action} (ABA), a method that empowers the agent to proactively query external sources for pertinent information using natural language during their interactions in the environment. In this way, the agent is able to enhance its efficiency and performance by mitigating wasteful steps and circumventing the difficulties associated with exploration in unfamiliar environments. We empirically evaluate our method on an embodied decision making benchmark, ALFWorld, and demonstrate that despite modest modifications in prompts, our method exceeds baseline LLM agents by more than $40$%. Further experiments on two variants of ALFWorld illustrate that by imitation learning, ABA effectively retains and reuses queried and known information in subsequent tasks, mitigating the need for repetitive inquiries. Both qualitative and quantitative results exhibit remarkable performance on tasks that previous methods struggle to solve.

Viaarxiv icon

Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem

Apr 19, 2023
Yan Jin, Yuandong Ding, Xuanhao Pan, Kun He, Li Zhao, Tao Qin, Lei Song, Jiang Bian

Figure 1 for Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem
Figure 2 for Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem
Figure 3 for Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem
Figure 4 for Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem

Traveling Salesman Problem (TSP), as a classic routing optimization problem originally arising in the domain of transportation and logistics, has become a critical task in broader domains, such as manufacturing and biology. Recently, Deep Reinforcement Learning (DRL) has been increasingly employed to solve TSP due to its high inference efficiency. Nevertheless, most of existing end-to-end DRL algorithms only perform well on small TSP instances and can hardly generalize to large scale because of the drastically soaring memory consumption and computation time along with the enlarging problem scale. In this paper, we propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer. Particularly, Pointerformer adopts both reversible residual network in the encoder and multi-pointer network in the decoder to effectively contain memory consumption of the encoder-decoder architecture. To further improve the performance of TSP solutions, Pointerformer employs both a feature augmentation method to explore the symmetries of TSP at both training and inference stages as well as an enhanced context embedding approach to include more comprehensive context information in the query. Extensive experiments on a randomly generated benchmark and a public benchmark have shown that, while achieving comparative results on most small-scale TSP instances as SOTA DRL approaches do, Pointerformer can also well generalize to large-scale TSPs.

* Accepted by AAAI 2023, February 2023 
Viaarxiv icon

H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem

Apr 19, 2023
Xuanhao Pan, Yan Jin, Yuandong Ding, Mingxiao Feng, Li Zhao, Lei Song, Jiang Bian

Figure 1 for H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem
Figure 2 for H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem
Figure 3 for H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem
Figure 4 for H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem

We propose an end-to-end learning framework based on hierarchical reinforcement learning, called H-TSP, for addressing the large-scale Travelling Salesman Problem (TSP). The proposed H-TSP constructs a solution of a TSP instance starting from the scratch relying on two components: the upper-level policy chooses a small subset of nodes (up to 200 in our experiment) from all nodes that are to be traversed, while the lower-level policy takes the chosen nodes as input and outputs a tour connecting them to the existing partial route (initially only containing the depot). After jointly training the upper-level and lower-level policies, our approach can directly generate solutions for the given TSP instances without relying on any time-consuming search procedures. To demonstrate effectiveness of the proposed approach, we have conducted extensive experiments on randomly generated TSP instances with different numbers of nodes. We show that H-TSP can achieve comparable results (gap 3.42% vs. 7.32%) as SOTA search-based approaches, and more importantly, we reduce the time consumption up to two orders of magnitude (3.32s vs. 395.85s). To the best of our knowledge, H-TSP is the first end-to-end deep reinforcement learning approach that can scale to TSP instances of up to 10000 nodes. Although there are still gaps to SOTA results with respect to solution quality, we believe that H-TSP will be useful for practical applications, particularly those that are time-sensitive e.g., on-call routing and ride hailing service.

* Accepted by AAAI 2023, February 2023 
Viaarxiv icon

Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

Feb 17, 2023
Yan Zhao, Jincen Wang, Yuan Zong, Wenming Zheng, Hailun Lian, Li Zhao

Figure 1 for Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 2 for Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 3 for Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition
Figure 4 for Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion Recognition

In this paper, we propose a novel deep transfer learning method called deep implicit distribution alignment networks (DIDAN) to deal with cross-corpus speech emotion recognition (SER) problem, in which the labeled training (source) and unlabeled testing (target) speech signals come from different corpora. Specifically, DIDAN first adopts a simple deep regression network consisting of a set of convolutional and fully connected layers to directly regress the source speech spectrums into the emotional labels such that the proposed DIDAN can own the emotion discriminative ability. Then, such ability is transferred to be also applicable to the target speech samples regardless of corpus variance by resorting to a well-designed regularization term called implicit distribution alignment (IDA). Unlike widely-used maximum mean discrepancy (MMD) and its variants, the proposed IDA absorbs the idea of sample reconstruction to implicitly align the distribution gap, which enables DIDAN to learn both emotion discriminative and corpus invariant features from speech spectrums. To evaluate the proposed DIDAN, extensive cross-corpus SER experiments on widely-used speech emotion corpora are carried out. Experimental results show that the proposed DIDAN can outperform lots of recent state-of-the-art methods in coping with the cross-corpus SER tasks.

Viaarxiv icon

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

Dec 24, 2022
Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, Tie-Yan Liu

Figure 1 for An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
Figure 2 for An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
Figure 3 for An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
Figure 4 for An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.

* NeurIPS 2022 
Viaarxiv icon

Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management

Dec 18, 2022
Yuandong Ding, Mingxiao Feng, Guozi Liu, Wei Jiang, Chuheng Zhang, Li Zhao, Lei Song, Houqiang Li, Yan Jin, Jiang Bian

Figure 1 for Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management
Figure 2 for Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management
Figure 3 for Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management
Figure 4 for Multi-Agent Reinforcement Learning with Shared Resources for Inventory Management

In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand. In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU. We formulate the problem with this structure as Shared-Resource Stochastic Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO). Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.

* Appeared in RL4RealLife@NeurIPS 2022 
Viaarxiv icon