Unsupervised domain adaptation(UDA) has been applied to image semantic segmentation to solve the problem of domain offset. However, in some difficult categories with poor recognition accuracy, the segmentation effects are still not ideal. To this end, in this paper, Intra-subdomain adaptation adversarial learning segmentation method based on Dynamic Pseudo Labels(IDPL) is proposed. The whole process consists of 3 steps: Firstly, the instance-level pseudo label dynamic generation module is proposed, which fuses the class matching information in global classes and local instances, thus adaptively generating the optimal threshold for each class, obtaining high-quality pseudo labels. Secondly, the subdomain classifier module based on instance confidence is constructed, which can dynamically divide the target domain into easy and difficult subdomains according to the relative proportion of easy and difficult instances. Finally, the subdomain adversarial learning module based on self-attention is proposed. It uses multi-head self-attention to confront the easy and difficult subdomains at the class level with the help of generated high-quality pseudo labels, so as to focus on mining the features of difficult categories in the high-entropy region of target domain images, which promotes class-level conditional distribution alignment between the subdomains, improving the segmentation performance of difficult categories. For the difficult categories, the experimental results show that the performance of IDPL is significantly improved compared with other latest mainstream methods.
Reconfigurable intelligent surface (RIS) is regarded as a promising technology with great potential to boost wireless networks. Affected by the "double fading" effect, however, conventional passive RIS cannot bring considerable performance improvement when users are not close enough to RIS. Recently, active RIS is introduced to combat the double fading effect by actively amplifying incident signals with the aid of integrated reflection-type amplifiers. In order to reduce the hardware cost and energy consumption due to massive active components in the conventional fully-connected active RIS, a novel hardware-and-energy efficient sub-connected active RIS architecture has been proposed recently, in which multiple reconfigurable electromagnetic elements are driven by only one amplifier. In this paper, we first develop an improved and accurate signal model for the sub-connected active RIS architecture. Then, we investigate the joint transmit precoding and RIS reflection beamforming (i.e., the reflection phase-shift and amplification coefficients) designs in multiuser multiple-input single-output (MU-MISO) communication systems. Both sum-rate maximization and power minimization problems are solved by leveraging fractional programming (FP), block coordinate descent (BCD), second-order cone programming (SOCP), alternating direction method of multipliers (ADMM), and majorization-minimization (MM) methods. Extensive simulation results verify that compared with the conventional fully-connected structure, the proposed sub-connected active RIS can significantly reduce the hardware cost and power consumption, and achieve great performance improvement when power budget at RIS is limited.
This paper discribes the DKU-DukeECE submission to the 4th track of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our system contains a fused voice activity detection model, a clustering-based diarization model, and a target-speaker voice activity detection-based overlap detection model. Overall, the submitted system is similar to our previous year's system in VoxSRC-21. The difference is that we use a much better speaker embedding and a fused voice activity detection, which significantly improves the performance. Finally, we fuse 4 different systems using DOVER-lap and achieve 4.75 of the diarization error rate, which ranks the 1st place in track 4.
The computational prediction of wave propagation in dam-break floods is a long-standing problem in hydrodynamics and hydrology. Until now, conventional numerical models based on Saint-Venant equations are the dominant approaches. Here we show that a machine learning model that is well-trained on a minimal amount of data, can help predict the long-term dynamic behavior of a one-dimensional dam-break flood with satisfactory accuracy. For this purpose, we solve the Saint-Venant equations for a one-dimensional dam-break flood scenario using the Lax-Wendroff numerical scheme and train the reservoir computing echo state network (RC-ESN) with the dataset by the simulation results consisting of time-sequence flow depths. We demonstrate a good prediction ability of the RC-ESN model, which ahead predicts wave propagation behavior 286 time-steps in the dam-break flood with a root mean square error (RMSE) smaller than 0.01, outperforming the conventional long short-term memory (LSTM) model which reaches a comparable RMSE of only 81 time-steps ahead. To show the performance of the RC-ESN model, we also provide a sensitivity analysis of the prediction accuracy concerning the key parameters including training set size, reservoir size, and spectral radius. Results indicate that the RC-ESN are less dependent on the training set size, a medium reservoir size K=1200~2600 is sufficient. We confirm that the spectral radius \r{ho} shows a complex influence on the prediction accuracy and suggest a smaller spectral radius \r{ho} currently. By changing the initial flow depth of the dam break, we also obtained the conclusion that the prediction horizon of RC-ESN is larger than that of LSTM.
FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. The supplementary set consists of more recording devices and has the same data distribution as the FFSVC2022 evaluation set. This paper summarizes the FFSVC 2022, including tasks description, trial designing details, a baseline system and a summary of challenge results. The challenge results indicate substantial progress made in the field but also present that there are still difficulties with the far-field scenario.
FFSVC2022 is the second challenge of far-field speaker verification. FFSVC2022 provides the fully-supervised far-field speaker verification to further explore the far-field scenario and proposes semi-supervised far-field speaker verification. In contrast to FFSVC2020, FFSVC2022 focus on the single-channel scenario. In addition, a supplementary set for the FFSVC2020 dataset is released this year. The supplementary set consists of more recording devices and has the same data distribution as the FFSVC2022 evaluation set. This paper summarizes the FFSVC 2022, including tasks description, trial designing details, a baseline system and a summary of challenge results. The challenge results indicate substantial progress made in the field but also present that there are still difficulties with the far-field scenario.
Reconfigurable intelligent surface (RIS) has been deemed as one of potential components of future wireless communication systems because it can adaptively manipulate the wireless propagation environment with low-cost passive devices. However, due to the severe double path loss, the traditional passive RIS can provide sufficient gain only when receivers are very close to the RIS. Moreover, RIS cannot provide signal coverage for the receivers at the back side of it. To address these drawbacks in practical implementation, we introduce a novel reflection and transmission dual-functional active RIS architecture in this paper, which can simultaneously realize reflection and transmission functionalities with active signal amplification to significantly extend signal coverage and enhance the quality-of-service (QoS) of all users. The problem of joint transmit beamforming and dual-functional active RIS design is investigated in RIS-enhanced multiuser multiple-input single-output (MU-MISO) systems. Both sum-rate maximization and power minimization problems are considered. To address their non-convexity, we develop efficient iterative algorithms to decompose them into separate several design problems, which are efficiently solved by exploiting fractional programming (FP) and Riemannian-manifold optimization techniques. Simulation results demonstrate the superiority of the proposed dual-functional active RIS architecture and the effectiveness of our proposed algorithms over various benchmark schemes.
Compared with the rapid development of single-frequency multi-polarization SAR image classification technology, there is less research on the land cover classification of multifrequency polarimetric SAR (MF-PolSAR) images. In addition, the current deep learning methods for MF-PolSAR classification are mainly based on convolutional neural networks (CNNs), only local spatiality is considered but the nonlocal relationship is ignored. Therefore, based on semantic interaction and nonlocal topological structure, this paper proposes the MF semantics and topology fusion network (MF-STFnet) to improve MF-PolSAR classification performance. In MF-STFnet, two kinds of classification are implemented for each band, semantic information-based (SIC) and topological property-based (TPC). They work collaboratively during MF-STFnet training, which can not only fully leverage the complementarity of bands, but also combine local and nonlocal spatial information to improve the discrimination between different categories. For SIC, the designed crossband interactive feature extraction module (CIFEM) is embedded to explicitly model the deep semantic correlation among bands, thereby leveraging the complementarity of bands to make ground objects more separable. For TPC, the graph sample and aggregate network (GraphSAGE) is employed to dynamically capture the representation of nonlocal topological relations between land cover categories. In this way, the robustness of classification can be further improved by combining nonlocal spatial information. Finally, an adaptive weighting fusion (AWF) strategy is proposed to merge inference from different bands, so as to make the MF joint classification decisions of SIC and TPC. The comparative experiments show that MF-STFnet can achieve more competitive classification performance than some state-of-the-art methods.
Robotic peg-in-hole assembly is an essential task in robotic automation research. Reinforcement learning (RL) combined with deep neural networks (DNNs) lead to extraordinary achievements in this area. However, current RL-based approaches could hardly perform well under the unique environmental and mission requirements of fusion applications. Therefore, we have proposed a new designed RL-based method. Furthermore, unlike other approaches, we focus on innovations in the structure of DNNs instead of the RL model. Data from the RGB camera and force/torque (F/T) sensor as the input are fed into a multi-input branch network, and the best action in the current state is output by the network. All training and experiments are carried out in a realistic environment, and from the experiment result, this multi-sensor fusion approach has been shown to work well in rigid peg-in-hole assembly tasks with 0.1mm precision in uncertain and unstable environments.
Albeit with varying degrees of progress in the field of Semi-Supervised Semantic Segmentation, most of its recent successes are involved in unwieldy models and the lightweight solution is still not yet explored. We find that existing knowledge distillation techniques pay more attention to pixel-level concepts from labeled data, which fails to take more informative cues within unlabeled data into account. Consequently, we offer the first attempt to provide lightweight SSSS models via a novel multi-granularity distillation (MGD) scheme, where multi-granularity is captured from three aspects: i) complementary teacher structure; ii) labeled-unlabeled data cooperative distillation; iii) hierarchical and multi-levels loss setting. Specifically, MGD is formulated as a labeled-unlabeled data cooperative distillation scheme, which helps to take full advantage of diverse data characteristics that are essential in the semi-supervised setting. Image-level semantic-sensitive loss, region-level content-aware loss, and pixel-level consistency loss are set up to enrich hierarchical distillation abstraction via structurally complementary teachers. Experimental results on PASCAL VOC2012 and Cityscapes reveal that MGD can outperform the competitive approaches by a large margin under diverse partition protocols. For example, the performance of ResNet-18 and MobileNet-v2 backbone is boosted by 11.5% and 4.6% respectively under 1/16 partition protocol on Cityscapes. Although the FLOPs of the model backbone is compressed by 3.4-5.3x (ResNet-18) and 38.7-59.6x (MobileNetv2), the model manages to achieve satisfactory segmentation results.