The crossMoDA challenge aims to automatically segment the vestibular schwannoma (VS) tumor and cochlea regions of unlabeled high-resolution T2 scans by leveraging labeled contrast-enhanced T1 scans. The 2022 edition extends the segmentation task by including multi-institutional scans. In this work, we proposed an unpaired cross-modality segmentation framework using data augmentation and hybrid convolutional networks. Considering heterogeneous distributions and various image sizes for multi-institutional scans, we apply the min-max normalization for scaling the intensities of all scans between -1 and 1, and use the voxel size resampling and center cropping to obtain fixed-size sub-volumes for training. We adopt two data augmentation methods for effectively learning the semantic information and generating realistic target domain scans: generative and online data augmentation. For generative data augmentation, we use CUT and CycleGAN to generate two groups of realistic T2 volumes with different details and appearances for supervised segmentation training. For online data augmentation, we design a random tumor signal reducing method for simulating the heterogeneity of VS tumor signals. Furthermore, we utilize an advanced hybrid convolutional network with multi-dimensional convolutions to adaptively learn sparse inter-slice information and dense intra-slice information for accurate volumetric segmentation of VS tumor and cochlea regions in anisotropic scans. On the crossMoDA2022 validation dataset, our method produces promising results and achieves the mean DSC values of 72.47% and 76.48% and ASSD values of 3.42 mm and 0.53 mm for VS tumor and cochlea regions, respectively.
Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class information variation), which is harmful to the generalization of the trained model. To alleviate this problem, in this paper we propose Online Batch Diffusion-based Self-Distillation (OBD-SD) for DML. Specifically, we first propose a simple but effective Progressive Self-Distillation (PSD), which distills the knowledge progressively from the model itself during training. The soft distance targets achieved by PSD can present richer relational information among samples, which is beneficial for the diversity of embedding representations. Then, we extend PSD with an Online Batch Diffusion Process (OBDP), which is to capture the local geometric structure of manifolds in each batch, so that it can reveal the intrinsic relationships among samples in the batch and produce better soft distance targets. Note that our OBDP is able to restore the insufficient manifold relationships obtained by the original PSD and achieve significant performance improvement. Our OBD-SD is a flexible framework that can be integrated into state-of-the-art (SOTA) DML methods. Extensive experiments on various benchmarks, namely CUB200, CARS196, and Stanford Online Products, demonstrate that our OBD-SD consistently improves the performance of the existing DML methods on multiple datasets with negligible additional training time, achieving very competitive results. Code: \url{https://github.com/ZelongZeng/OBD-SD_Pytorch}
Language modeling on large-scale datasets leads to impressive performance gains on various downstream language tasks. The validation pre-training loss (or perplexity in autoregressive language modeling) is often used as the evaluation metric when developing language models since the pre-training loss tends to be well-correlated with downstream performance (which is itself difficult to evaluate comprehensively). Contrary to this conventional wisdom, this paper shows that 1) pre-training loss cannot fully explain downstream performance and 2) flatness of the model is well-correlated with downstream performance where pre-training loss is not. On simplified datasets, we identify three ways to produce models with the same (statistically optimal) pre-training loss but different downstream performance: continue pre-training after convergence, increasing the model size, and changing the training algorithm. These experiments demonstrate the existence of implicit bias of pre-training algorithms/optimizers -- among models with the same minimal pre-training loss, they implicitly prefer more transferable ones. Toward understanding this implicit bias, we prove that SGD with standard mini-batch noise implicitly prefers flatter minima in language models, and empirically observe a strong correlation between flatness and downstream performance among models with the same minimal pre-training loss. We also prove in a synthetic language setting that among the models with the minimal pre-training loss, the flattest model transfers to downstream tasks.
Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important ingredient in a US is that the user goal can be effectively incorporated and tracked; but how to flexibly integrate goal state tracking and develop an end-to-end trainable US for multi-domains has remained to be a challenge. In this work, we propose a generative user simulator (GUS) with GPT-2 based architecture and goal state tracking towards addressing the above two challenges. Extensive experiments are conducted on MultiWOZ2.1. Different DSs are trained via RL with GUS, the classic agenda-based user simulator (ABUS) and other ablation simulators respectively, and are compared for cross-model evaluation, corpus-based evaluation and human evaluation. The GUS achieves superior results in all three evaluation tasks.
In this paper, we describe in detail our system for DCASE 2022 Task4. The system combines two considerably different models: an end-to-end Sound Event Detection Transformer (SEDT) and a frame-wise model, Metric Learning and Focal Loss CNN (MLFL-CNN). The former is an event-wise model which learns event-level representations and predicts sound event categories and boundaries directly, while the latter is based on the widely adopted frame-classification scheme, under which each frame is classified into event categories and event boundaries are obtained by post-processing such as thresholding and smoothing. For SEDT, self-supervised pre-training using unlabeled data is applied, and semi-supervised learning is adopted by using an online teacher, which is updated from the student model using the Exponential Moving Average (EMA) strategy and generates reliable pseudo labels for weakly-labeled and unlabeled data. For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used. Experimental results show that the hybrid system considerably outperforms either individual model and achieves psds1 of 0.420 and psds2 of 0.783 on the validation set without external data. The code is available at https://github.com/965694547/Hybrid-system-of-frame-wise-model-and-SEDT.
Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as input, which brings inefficiencies in memory and compute. In this paper, we first propose Simplified Generative Architectures (SGA) for DS and US respectively, both based on GPT-2 but using shortened history. Then, we successfully develop Jointly Reinforced US and DS, called SGA-JRUD. Our DS with the proposed SGA, when only supervised trained, achieves state-of-the-art performance on MultiWOZ2.1 and is more compute-efficient in both training and generation. Extensive experiments on MultiWOZ2.1 further show the superiority of SGA-JRUD in both offline and online evaluations.
Robot actuators directly affect the performance of robots, and robot drives directly affect the performance of robot actuators. With the development of robotics, robots have put higher requirements on robot drives, such as high stiffness, high accuracy, high loading, high efficiency, low backlash, compact size, and hollow structure. In order to meet the demand development of robot actuators, this research base proposes a new robot drive based on 3K-H-V topology using involute and cycloidal gear shapes, planetary cycloidal drive, from the perspective of drive topology and through the design idea of decoupling. In this study, the reduction ratio and the efficiency model of the 3K-H-V topology were analyzed, and a prototype planetary cycloidal actuator was designed. The feasibility of the drive is initially verified by experimentally concluding that the PCA has a hollow structure, compact size, and high torque density (69 kg/Nm).
Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog transcripts between real users and customer-service staffs from China Mobile. Based on the MobileCS dataset, the SereTOD challenge has two tasks, not only evaluating the construction of the dialogue system itself, but also examining information extraction from dialog transcripts, which is crucial for building the knowledge base for TOD. This paper mainly presents a baseline study of the two tasks with the MobileCS dataset. We introduce how the two baselines are constructed, the problems encountered, and the results. We anticipate that the baselines can facilitate exciting future research to build human-robot dialogue systems for real-life tasks.
Modular design is the foundation of on orbit construction technology of large space facilities in the future.Standard interface is the key technology of modular design of the future space robotic systems and space facilities.This paper presents the designed and tested of PetLock,a standard and genderless interface which can transfer mechanical loads,power and data between the future modular space robotic manipulator and spacecraft.PetLock adopts a completely genderless design,including connection face,locking mechanism,data and power interface.The connection surface provides a large translation and rotation misalignment tolerance,due to its 120-degree symmetrical and 3D shape design.The locking mechanism features the three locking pins retraction structure design,which is simple and reliable.POGO pin connectors in the center of the interface provides the power and data transfer capabilities.Due to the advantages of high locking force,large tolerance,high reliability and low cost,PetLock has the very big application potential in future on orbit construction missions.
Forward and backward reaching inverse kinematics (FABRIK) is a heuristic inverse kinematics solver that is gradually applied to manipulators with the advantages of fast convergence and generating more realistic configurations. However, under the high error constraint, FABRIK exhibits unstable convergence behavior, which is unsatisfactory for the real-time motion planning of manipulators. In this paper, a novel inverse kinematics algorithm that combines FABRIK and the sequential quadratic programming (SQP) algorithm is presented, in which the joint angles deduced by FABRIK will be taken as the initial seed of the SQP algorithm to avoid getting stuck in local minima. The combined algorithm is evaluated with experiments, in which our algorithm can achieve higher success rates and faster solution times than FABRIK under the high error constraint. Furthermore, the combined algorithm can generate continuous trajectories for the UR5 and KUKA LBR IIWA 14 R820 manipulators in path tracking with no pose error and permitted position error of the end-effector.