In multi-modal action recognition, it is important to consider not only the complementary nature of different modalities but also global action content. In this paper, we propose a novel network, named Modality Mixer (M-Mixer) network, to leverage complementary information across modalities and temporal context of an action for multi-modal action recognition. We also introduce a simple yet effective recurrent unit, called Multi-modal Contextualization Unit (MCU), which is a core component of M-Mixer. Our MCU temporally encodes a sequence of one modality (e.g., RGB) with action content features of other modalities (e.g., depth, IR). This process encourages M-Mixer to exploit global action content and also to supplement complementary information of other modalities. As a result, our proposed method outperforms state-of-the-art methods on NTU RGB+D 60, NTU RGB+D 120, and NW-UCLA datasets. Moreover, we demonstrate the effectiveness of M-Mixer by conducting comprehensive ablation studies.
We present TwHIN-BERT, a multilingual language model trained on in-domain data from the popular social network Twitter. TwHIN-BERT differs from prior pre-trained language models as it is trained with not only text-based self-supervision, but also with a social objective based on the rich social engagements within a Twitter heterogeneous information network (TwHIN). Our model is trained on 7 billion tweets covering over 100 distinct languages providing a valuable representation to model short, noisy, user-generated text. We evaluate our model on a variety of multilingual social recommendation and semantic understanding tasks and demonstrate significant metric improvement over established pre-trained language models. We will freely open-source TwHIN-BERT and our curated hashtag prediction and social engagement benchmark datasets to the research community.
HELM Learning (Helping Everyone Learn More) is the first online peer-to-peer learning platform which allows students (typically middle-to-high school students) to teach classes and students (typically elementary-to-middle school students) to learn from classes for free. This method of class structure (peer-to-peer learning) has been proven effective for learning, as it promotes teamwork and collaboration, and enables active learning. HELM is a unique platform as it provides an easy process for students to create, teach and learn topics in a structured, peer-to-peer environment. Since HELM was created in April 2020, it has gotten over 4000 student sign ups and 80 teachers, in 4 continents around the world. HELM has grown from a simple website-and-Google-Form platform to having a backend system coded with Python, SQL, JavaScript and HTML, hosted on an AWS service. This not only makes it easier for students to sign up (as the students' information is saved in an SQL database, meaning they can sign up for classes without having to put in their information again, as well as getting automated emails about their classes), but also makes it easier for teachers to teach (as supplemental processes such as creating Zoom links, class recording folders, sending emails to students, etc. are done automatically). In addition, HELM has a recommendation machine learning algorithm which suggests classes and subjects students would enjoy taking, based on the previous classes a student has taken. This has created an easier experience for students to sign up for classes they are interested in.
The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial to performing real-world force interaction tasks with human-level dexterity. This work presents a Deep Model Predictive Variable Impedance Controller for compliant robotic manipulation which combines Variable Impedance Control with Model Predictive Control (MPC). A generalized Cartesian impedance model of a robot manipulator is learned using an exploration strategy maximizing the information gain. This model is used within an MPC framework to adapt the impedance parameters of a low-level variable impedance controller to achieve the desired compliance behavior for different manipulation tasks without any retraining or finetuning. The deep Model Predictive Variable Impedance Control approach is evaluated using a Franka Emika Panda robotic manipulator operating on different manipulation tasks in simulations and real experiments. The proposed approach was compared with model-free and model-based reinforcement approaches in variable impedance control for transferability between tasks and performance.
The Cooperative Rate-Splitting (CRS) scheme, proposed evolves from conventional Rate Splitting (RS) and relies on forwarding a portion of the RS message by the relaying users. In terms of secrecy enhancement, it has been shown that CRS outperforms its non-cooperative counterpart for a two-user Multiple Input Single Output (MISO) Broadcast Channel (BC). Given the massive connectivity requirement of 6G, we have generalized the existing secure two-user CRS framework to the multi-user framework, where the highest-security users must be selected as the relay nodes. This paper addresses the problem of maximizing the Worst-Case Secrecy Rate (WCSR) in a UAV-aided downlink network where a multi-antenna UAV Base-Station (UAV-BS) serves a group of users in the presence of an external eavesdropper (Eve). We consider a practical scenario in which only imperfect channel state information of Eve is available at the UAV-BS. Accordingly, we conceive a robust and secure resource allocation algorithm, which maximizes the WCSR by jointly optimizing both the Secure Relaying User Selection (SRUS) and the network parameter allocation problem, including the RS transmit precoders, message splitting variables, time slot sharing and power allocation. To circumvent the resultant non-convexity owing to the discrete variables imposed by SRUS, we propose a two-stage algorithm where the SRUS and network parameter allocation are accomplished in two consecutive stages. With regard to the SRUS, we study both centralized and distributed protocols. On the other hand, for jointly optimizing the network parameter allocation we resort to the Sequential Parametric Convex Approximation (SPCA) algorithm. Our numerical results show that the proposed solution significantly outperforms the existing benchmarks for a wide range of network loads in terms of the WCSR.
Self-supervised contrastive representation learning offers the advantage of learning meaningful visual representations from unlabeled medical datasets for transfer learning. However, applying current contrastive learning approaches to medical data without considering its domain-specific anatomical characteristics may lead to visual representations that are inconsistent in appearance and semantics. In this paper, we propose to improve visual representations of medical images via anatomy-aware contrastive learning (AWCL), which incorporates anatomy information to augment the positive/negative pair sampling in a contrastive learning manner. The proposed approach is demonstrated for automated fetal ultrasound imaging tasks, enabling the positive pairs from the same or different ultrasound scans that are anatomically similar to be pulled together and thus improving the representation learning. We empirically investigate the effect of inclusion of anatomy information with coarse- and fine-grained granularity, for contrastive learning and find that learning with fine-grained anatomy information which preserves intra-class difference is more effective than its counterpart. We also analyze the impact of anatomy ratio on our AWCL framework and find that using more distinct but anatomically similar samples to compose positive pairs results in better quality representations. Experiments on a large-scale fetal ultrasound dataset demonstrate that our approach is effective for learning representations that transfer well to three clinical downstream tasks, and achieves superior performance compared to ImageNet supervised and the current state-of-the-art contrastive learning methods. In particular, AWCL outperforms ImageNet supervised method by 13.8% and state-of-the-art contrastive-based method by 7.1% on a cross-domain segmentation task.
We propose an efficient algorithm for graph matching based on similarity scores constructed from counting a certain family of weighted trees rooted at each vertex. For two Erd\H{o}s-R\'enyi graphs $\mathcal{G}(n,q)$ whose edges are correlated through a latent vertex correspondence, we show that this algorithm correctly matches all but a vanishing fraction of the vertices with high probability, provided that $nq\to\infty$ and the edge correlation coefficient $\rho$ satisfies $\rho^2>\alpha \approx 0.338$, where $\alpha$ is Otter's tree-counting constant. Moreover, this almost exact matching can be made exact under an extra condition that is information-theoretically necessary. This is the first polynomial-time graph matching algorithm that succeeds at an explicit constant correlation and applies to both sparse and dense graphs. In comparison, previous methods either require $\rho=1-o(1)$ or are restricted to sparse graphs. The crux of the algorithm is a carefully curated family of rooted trees called chandeliers, which allows effective extraction of the graph correlation from the counts of the same tree while suppressing the undesirable correlation between those of different trees.
Small lesions in magnetic resonance imaging (MRI) images are crucial for clinical diagnosis of many kinds of diseases. However, the MRI quality can be easily degraded by various noise, which can greatly affect the accuracy of diagnosis of small lesion. Although some methods for denoising MR images have been proposed, task-specific denoising methods for improving the diagnosis confidence of small lesions are lacking. In this work, we propose a voxel-wise hybrid residual MLP-CNN model to denoise three-dimensional (3D) MR images with small lesions. We combine basic deep learning architecture, MLP and CNN, to obtain an appropriate inherent bias for the image denoising and integrate each output layers in MLP and CNN by adding residual connections to leverage long-range information. We evaluate the proposed method on 720 T2-FLAIR brain images with small lesions at different noise levels. The results show the superiority of our method in both quantitative and visual evaluations on testing dataset compared to state-of-the-art methods. Moreover, two experienced radiologists agreed that at moderate and high noise levels, our method outperforms other methods in terms of recovery of small lesions and overall image denoising quality. The implementation of our method is available at https://github.com/laowangbobo/Residual_MLP_CNN_Mixer.
The non-terrestrial networks (NTNs) are recognized as a key component to provide cost-effective and high-capacity ubiquitous connectivity in the future wireless communications. In this paper, we investigate the secure transmission in a terahertz (THz)-empowered reconfigurable intelligent surface (RIS)-assisted NTN (T-RANTN), which is composed of a low-Earth orbit satellite transmitter, an RIS-installed high-altitude platform (HAP) and two unmanned aerial vehicle (UAV) receivers, only one of which is trustworthy. An approximate ergodic secrecy rate (ESR) expression is derived when the atmosphere turbulence and pointing error due to the characteristics of THz as well as the phase errors resulting from finite precision of RIS and imperfect channel estimation are taken into account simultaneously. Furthermore, according to the statistical and perfect channel state information of the untrustworthy receiver, we optimize the phase shifts of RIS to maximize the lower bound of secrecy rate (SR) and instantaneous SR, respectively, by using semidefinite relaxation method. Simulation results show that both the approximate expression for the ESR and the optimization algorithms are serviceable, and even when the jitter standard variance of the trustworthy receiver is greater than that of the untrustworthy one, a positive SR can still be guaranteed.
The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multi-modality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM- MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our proposed method against the state-of-the-art for the disease diagnosis and the identification of disease-specific regions and modality-related differences. The code for this work is publicly available at https://github.com/junfish/BIOS22.