The use of Deep Learning techniques for classification in Hyperspectral Imaging (HSI) is rapidly growing and achieving improved performances. Due to the nature of the data captured by sensors that produce HSI images, a common issue is the dimensionality of the bands that may or may not contribute to the label class distinction. Due to the widespread nature of class labels, Principal Component Analysis is a common method used for reducing the dimensionality. However,there may exist methods that incorporate all bands of the Hyperspectral image with the help of the Attention mechanism. Furthermore, to yield better spectral spatial feature extraction, recent methods have also explored the usage of Graph Convolution Networks and their unique ability to use node features in prediction, which is akin to the pixel spectral makeup. In this survey we present a comprehensive summary of Graph based and Attention based methods to perform Hyperspectral Image Classification for remote sensing and aerial HSI images. We also summarize relevant datasets on which these techniques have been evaluated and benchmark the processing techniques.
Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, we present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmentation approach based on conditional generation to address the data scarcity problem in low-resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem existing NER data augmentation techniques suffer from and often generates incoherent augmentations by placing complex named entities in the wrong context. ACLM builds on BART and is optimized on a novel text reconstruction or denoising task - we use selective masking (aided by attention maps) to retain the named entities and certain keywords in the input sentence that provide contextually relevant additional knowledge or hints about the named entities. Compared with other data augmentation strategies, ACLM can generate more diverse and coherent augmentations preserving the true word sense of complex entities in the sentence. We demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER across various low-resource settings. ACLM outperforms all our neural baselines by a significant margin (1%-36%). In addition, we demonstrate the application of ACLM to other domains that suffer from data scarcity (e.g., biomedical). In practice, ACLM generates more effective and factual augmentations for these domains than prior methods. Code: https://github.com/Sreyan88/ACLM
Understanding the representations of different languages in multilingual language models is essential for comprehending their cross-lingual properties, predicting their performance on downstream tasks, and identifying any biases across languages. In our study, we analyze the geometry of three multilingual language models in Euclidean space and find that all languages are represented by unique geometries. Using a geometric separability index we find that although languages tend to be closer according to their linguistic family, they are almost separable with languages from other families. We also introduce a Cross-Lingual Similarity Index to measure the distance of languages with each other in the semantic space. Our findings indicate that the low-resource languages are not represented as good as high resource languages in any of the models
The tremendous growth of social media users interacting in online conversations has also led to significant growth in hate speech. Most of the prior works focus on detecting explicit hate speech, which is overt and leverages hateful phrases, with very little work focusing on detecting hate speech that is implicit or denotes hatred through indirect or coded language. In this paper, we present CoSyn, a user- and conversational-context synergized network for detecting implicit hate speech in online conversation trees. CoSyn first models the user's personal historical and social context using a novel hyperbolic Fourier attention mechanism and hyperbolic graph convolution network. Next, we jointly model the user's personal context and the conversational context using a novel context interaction mechanism in the hyperbolic space that clearly captures the interplay between the two and makes independent assessments on the amounts of information to be retrieved from both contexts. CoSyn performs all operations in the hyperbolic space to account for the scale-free dynamics of social media. We demonstrate the effectiveness of CoSyn both qualitatively and quantitatively on an open-source hate speech dataset with Twitter conversations and show that CoSyn outperforms all our baselines in detecting implicit hate speech with absolute improvements in the range of 8.15% - 19.50%.
Intimacy is an essential element of human relationships and language is a crucial means of conveying it. Textual intimacy analysis can reveal social norms in different contexts and serve as a benchmark for testing computational models' ability to understand social information. In this paper, we propose a novel weak-labeling strategy for data augmentation in text regression tasks called WADER. WADER uses data augmentation to address the problems of data imbalance and data scarcity and provides a method for data augmentation in cross-lingual, zero-shot tasks. We benchmark the performance of State-of-the-Art pre-trained multilingual language models using WADER and analyze the use of sampling techniques to mitigate bias in data and optimally select augmentation candidates. Our results show that WADER outperforms the baseline model and provides a direction for mitigating data imbalance and scarcity in text regression tasks.
Disfluency, though originating from human spoken utterances, is primarily studied as a uni-modal text-based Natural Language Processing (NLP) task. Based on early-fusion and self-attention-based multimodal interaction between text and acoustic modalities, in this paper, we propose a novel multimodal architecture for disfluency detection from individual utterances. Our architecture leverages a multimodal dynamic fusion network that adds minimal parameters over an existing text encoder commonly used in prior art to leverage the prosodic and acoustic cues hidden in speech. Through experiments, we show that our proposed model achieves state-of-the-art results on the widely used English Switchboard for disfluency detection and outperforms prior unimodal and multimodal systems in literature by a significant margin. In addition, we make a thorough qualitative analysis and show that, unlike text-only systems, which suffer from spurious correlations in the data, our system overcomes this problem through additional cues from speech signals. We make all our codes publicly available on GitHub.
This study presents a methodology for anticounterfeiting of Non-Volatile Memory (NVM) chips. In particular, we experimentally demonstrate a generalized methodology for detecting (i) Integrated Circuit (IC) origin, (ii) recycled or used NVM chips, and (iii) identification of used locations (addresses) in the chip. Our proposed methodology inspects latency and variability signatures of Commercial-Off-The-Shelf (COTS) NVM chips. The proposed technique requires low-cycle (~100) pre-conditioning and utilizes Machine Learning (ML) algorithms. We observe different trends in evolution of latency (sector erase or page write) with cycling on different NVM technologies from different vendors. ML assisted approach is utilized for detecting IC manufacturers with 95.1 % accuracy obtained on prepared test dataset consisting of 3 different NVM technologies including 6 different manufacturers (9 types of chips).
Deep learning research has generated widespread interest leading to emergence of a large variety of technological innovations and applications. As significant proportion of deep learning research focuses on vision based applications, there exists a potential for using some of these techniques to enable low-power portable health-care diagnostic support solutions. In this paper, we propose an embedded-hardware-based implementation of microscopy diagnostic support system for PoC case study on: (a) Malaria in thick blood smears, (b) Tuberculosis in sputum samples, and (c) Intestinal parasite infection in stool samples. We use a Squeeze-Net based model to reduce the network size and computation time. We also utilize the Trained Quantization technique to further reduce memory footprint of the learned models. This enables microscopy-based detection of pathogens that classifies with laboratory expert level accuracy as a standalone embedded hardware platform. The proposed implementation is 6x more power-efficient compared to conventional CPU-based implementation and has an inference time of $\sim$ 3 ms/sample.
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. Through simulations, we evaluate a CPU and two systolic inference accelerator implementations. Next, we compare these hardware solutions with advanced technology nodes. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated. We found that significant energy benefits (>=80%) can be achieved for hand detection (IPS=40) and eye segmentation (IPS=6) by introducing non-volatile memory in the memory hierarchy for designs at 7nm node while meeting minimum IPS (inference per second). Moreover, we can realize substantial reduction in area (>=30%) owing to the small form factor of MRAM compared to traditional SRAM.
EMG (Electromyograph) signal based gesture recognition can prove vital for applications such as smart wearables and bio-medical neuro-prosthetic control. Spiking Neural Networks (SNNs) are promising for low-power, real-time EMG gesture recognition, owing to their inherent spike/event driven spatio-temporal dynamics. In literature, there are limited demonstrations of neuromorphic hardware implementation (at full chip/board/system scale) for EMG gesture classification. Moreover, most literature attempts exploit primitive SNNs based on LIF (Leaky Integrate and Fire) neurons. In this work, we address the aforementioned gaps with following key contributions: (1) Low-power, high accuracy demonstration of EMG-signal based gesture recognition using neuromorphic Recurrent Spiking Neural Networks (RSNN). In particular, we propose a multi-time scale recurrent neuromorphic system based on special double-exponential adaptive threshold (DEXAT) neurons. Our network achieves state-of-the-art classification accuracy (90%) while using ~53% lesser neurons than best reported prior art on Roshambo EMG dataset. (2) A new multi-channel spike encoder scheme for efficient processing of real-valued EMG data on neuromorphic systems. (3) Unique multi-compartment methodology to implement complex adaptive neurons on Intel's dedicated neuromorphic Loihi chip is shown. (4) RSNN implementation on Loihi (Nahuku 32) achieves significant energy/latency benefits of ~983X/19X compared to GPU for batch size as 50.