Abstract:Multi-intent spoken language understanding (SLU) involves two tasks: multiple intent detection and slot filling, which jointly handle utterances containing more than one intent. Owing to this characteristic, which closely reflects real-world applications, the task has attracted increasing research attention, and substantial progress has been achieved. However, there remains a lack of a comprehensive and systematic review of existing studies on multi-intent SLU. To this end, this paper presents a survey of recent advances in multi-intent SLU. We provide an in-depth overview of previous research from two perspectives: decoding paradigms and modeling approaches. On this basis, we further compare the performance of representative models and analyze their strengths and limitations. Finally, we discuss the current challenges and outline promising directions for future research. We hope this survey will offer valuable insights and serve as a useful reference for advancing research in multi-intent SLU.




Abstract:Stereo matching in remote sensing has recently garnered increased attention, primarily focusing on supervised learning. However, datasets with ground truth generated by expensive airbone Lidar exhibit limited quantity and diversity, constraining the effectiveness of supervised networks. In contrast, unsupervised learning methods can leverage the increasing availability of very-high-resolution (VHR) remote sensing images, offering considerable potential in the realm of stereo matching. Motivated by this intuition, we propose a novel unsupervised stereo matching network for VHR remote sensing images. A light-weight module to bridge confidence with predicted error is introduced to refine the core model. Robust unsupervised losses are formulated to enhance network convergence. The experimental results on US3D and WHU-Stereo datasets demonstrate that the proposed network achieves superior accuracy compared to other unsupervised networks and exhibits better generalization capabilities than supervised models. Our code will be available at https://github.com/Elenairene/CBEM.




Abstract:Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-domain data from different sensors and scenarios, in this paper, we dedicate to study key training factors from three perspectives. (1) For the selection of training dataset, it is important to select data with similar regional target distribution as the test set instead of utilizing data from the same sensor. (2) For model structure, cascaded structure that flexibly adapts to different sizes of features is preferred. (3) For training manner, unsupervised methods generalize better than supervised methods, and we design an unsupervised early-stop strategy to help retain the best model with pre-trained weights as the basis. Extensive experiments are conducted to support the previous findings, on the basis of which we present an unsupervised stereo matching network with good generalization performance. We release the source code and the datasets at https://github.com/Elenairene/RKF_RSSM to reproduce the results and encourage future work.