Abstract:This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW), aiming to use a single deep reinforcement learning (DRL) model to solve the entire multiobjective optimization problem. The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL, thereby mitigating the limitations of both approaches. Firstly, we design an MOVRPTW model to balance the minimization of travel cost and the maximization of customer satisfaction. Subsequently, we present a novel DRL framework that incorporates a transformer-based policy network. This network is composed of an encoder module, a weight embedding module where the weights of the objective functions are incorporated, and a decoder module. NSGA-II is then utilized to optimize the solutions generated by WADRL. Finally, extensive experimental results demonstrate that our method outperforms the existing and traditional methods. Due to the numerous constraints in VRPTW, generating initial solutions of the NSGA-II algorithm can be time-consuming. However, using solutions generated by the WADRL as initial solutions for NSGA-II significantly reduces the time required for generating initial solutions. Meanwhile, the NSGA-II algorithm can enhance the quality of solutions generated by WADRL, resulting in solutions with better scalability. Notably, the weight-aware strategy significantly reduces the training time of DRL while achieving better results, enabling a single DRL model to solve the entire multiobjective optimization problem.
Abstract:Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting. In this paper, we propose STOC-TOT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reasoning step. At answer time, we conduct constrained decoding on the model to generate more grounded answers and reduce hallucination. Experiments comparing STOC-TOT with two MHQA datasets and five large language models showed that our framework outperforms other reasoning prompts by a significant margin.
Abstract:In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
Abstract:Bilevel optimization is an important formulation for many machine learning problems. Current bilevel optimization algorithms assume that the gradient of the upper-level function is Lipschitz. However, recent studies reveal that certain neural networks such as recurrent neural networks (RNNs) and long-short-term memory networks (LSTMs) exhibit potential unbounded smoothness, rendering conventional bilevel optimization algorithms unsuitable. In this paper, we design a new bilevel optimization algorithm, namely BO-REP, to address this challenge. This algorithm updates the upper-level variable using normalized momentum and incorporates two novel techniques for updating the lower-level variable: \textit{initialization refinement} and \textit{periodic updates}. Specifically, once the upper-level variable is initialized, a subroutine is invoked to obtain a refined estimate of the corresponding optimal lower-level variable, and the lower-level variable is updated only after every specific period instead of each iteration. When the upper-level problem is nonconvex and unbounded smooth, and the lower-level problem is strongly convex, we prove that our algorithm requires $\widetilde{\mathcal{O}}(1/\epsilon^4)$ iterations to find an $\epsilon$-stationary point in the stochastic setting, where each iteration involves calling a stochastic gradient or Hessian-vector product oracle. Notably, this result matches the state-of-the-art complexity results under the bounded smoothness setting and without mean-squared smoothness of the stochastic gradient, up to logarithmic factors. Our proof relies on novel technical lemmas for the periodically updated lower-level variable, which are of independent interest. Our experiments on hyper-representation learning, hyperparameter optimization, and data hyper-cleaning for text classification tasks demonstrate the effectiveness of our proposed algorithm.
Abstract:Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.
Abstract:Due to nonholonomic dynamics, the motion planning of nonholonomic robots is always a difficult problem. This letter presents a Discrete States-based Trajectory Planning(DSTP) algorithm for autonomous nonholonomic robots. The proposed algorithm represents the trajectory as x and y positions, orientation angle, longitude velocity and acceleration, angular velocity, and time intervals. More variables make the expression of optimization and constraints simpler, reduce the error caused by too many approximations, and also handle the gear shifting situation. L-BFGS-B is used to deal with the optimization of many variables and box constraints, thus speeding up the problem solving. Various simulation experiments compared with prior works have validated that our algorithm has an order-of-magnitude efficiency advantage and can generate a smoother trajectory with a high speed and low control effort. Besides, real-world experiments are also conducted to verify the feasibility of our algorithm in real scenes. We will release our codes as ros packages.
Abstract:Recently Convolution-augmented Transformer (Conformer) has shown promising results in Automatic Speech Recognition (ASR), outperforming the previous best published Transformer Transducer. In this work, we believe that the output information of each block in the encoder and decoder is not completely inclusive, in other words, their output information may be complementary. We study how to take advantage of the complementary information of each block in a parameter-efficient way, and it is expected that this may lead to more robust performance. Therefore we propose the Block-augmented Transformer for speech recognition, named Blockformer. We have implemented two block ensemble methods: the base Weighted Sum of the Blocks Output (Base-WSBO), and the Squeeze-and-Excitation module to Weighted Sum of the Blocks Output (SE-WSBO). Experiments have proved that the Blockformer significantly outperforms the state-of-the-art Conformer-based models on AISHELL-1, our model achieves a CER of 4.35\% without using a language model and 4.10\% with an external language model on the testset.
Abstract:Spherical robot is a nonlinear, nonholonomic and unstable system which increases the difficulty of the direction and trajectory tracking problem. In this study, we propose a new direction controller HTSMC, an instruction planning controller MPC, and a trajectory tracking framework MHH. The HTSMC is designed by integrating a fast terminal algorithm, a hierarchical method, the motion features of a spherical robot, and its dynamics. In addition, the new direction controller has an excellent control effect with a quick response speed and strong stability. MPC can obtain optimal commands that are then transmitted to the velocity and direction controller. Since the two torque controllers in MHH are all Lyapunov-based sliding mode controllers, the MHH framework may achieve optimal control performance while assuring stability. Finally, the two controllers eliminate the requirement for MPC's stability and dynamic constraints. Finally, hardware experiments demonstrate the efficacy of the HTSMC, MPC, and MHH.
Abstract:The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatically when testing on a different domain. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model's outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems on domain transfer.
Abstract:Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information. However, prior work on statistical machine translation has shown that extending the basic translation unit from words to phrases has produced substantial improvements, suggesting the possibility of improving NMT performance from explicit modeling of phrases. In this work, we present multi-granularity self-attention (Mg-Sa): a neural network that combines multi-head self-attention and phrase modeling. Specifically, we train several attention heads to attend to phrases in either n-gram or syntactic formalism. Moreover, we exploit interactions among phrases to enhance the strength of structure modeling - a commonly-cited weakness of self-attention. Experimental results on WMT14 English-to-German and NIST Chinese-to-English translation tasks show the proposed approach consistently improves performance. Targeted linguistic analysis reveals that Mg-Sa indeed captures useful phrase information at various levels of granularities.