Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, which offers an excellent opportunity for researchers to design and implement novel event detection frameworks. However, most existing approaches merely exploit keyword burstiness or network structures to detect unspecified events. Thus, they often fail to identify unspecified events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, and irregular language, as well as variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this thesis, we propose a novel framework, namely EnrichEvent, that leverages the lexical and contextual representations of streaming social data. In particular, we leverage contextual knowledge, as well as lexical knowledge, to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.
This paper addresses the problem of ranking pre-trained models for object detection and image classification. Selecting the best pre-trained model by fine-tuning is an expensive and time-consuming task. Previous works have proposed transferability estimation based on features extracted by the pre-trained models. We argue that quantifying whether the target dataset is in-distribution (IND) or out-of-distribution (OOD) for the pre-trained model is an important factor in the transferability estimation. To this end, we propose ETran, an energy-based transferability assessment metric, which includes three scores: 1) energy score, 2) classification score, and 3) regression score. We use energy-based models to determine whether the target dataset is OOD or IND for the pre-trained model. In contrast to the prior works, ETran is applicable to a wide range of tasks including classification, regression, and object detection (classification+regression). This is the first work that proposes transferability estimation for object detection task. Our extensive experiments on four benchmarks and two tasks show that ETran outperforms previous works on object detection and classification benchmarks by an average of 21% and 12%, respectively, and achieves SOTA in transferability assessment.
Natural Language Understanding (NLU) is important in today's technology as it enables machines to comprehend and process human language, leading to improved human-computer interactions and advancements in fields such as virtual assistants, chatbots, and language-based AI systems. This paper highlights the significance of advancing the field of NLU for low-resource languages. With intent detection and slot filling being crucial tasks in NLU, the widely used datasets ATIS and SNIPS have been utilized in the past. However, these datasets only cater to the English language and do not support other languages. In this work, we aim to address this gap by creating a Persian benchmark for joint intent detection and slot filling based on the ATIS dataset. To evaluate the effectiveness of our benchmark, we employ state-of-the-art methods for intent detection and slot filling.
Cell outage compensation enables a network to react to a catastrophic cell failure quickly and serve users in the outage zone uninterruptedly. Utilizing the promising benefits of non-orthogonal multiple access (NOMA) for improving the throughput of cell edge users, we propose a newly NOMA-based cell outage compensation scheme. In this scheme, the compensation is formulated as a mixed integer non-linear program (MINLP) where outage zone users are associated to neighboring cells and their power are allocated with the objective of maximizing spectral efficiency, subject to maintaining the quality of service for the rest of the users. Owing to the importance of immediate management of cell outage and handling the computational complexity, we develop a low-complexity suboptimal solution for this problem in which the user association scheme is determined by a newly heuristic algorithm, and power allocation is set by applying an innovative deep neural network (DNN). The complexity of our proposed method is in the order of polynomial basis, which is much less than the exponential complexity of finding an optimal solution. Simulation results demonstrate that the proposed method approaches the optimal solution. Moreover, the developed scheme greatly improves fairness and increases the number of served users.
Building huge and highly capable language models has been a trend in the past years. Despite their great performance, they incur high computational cost. A common solution is to apply model compression or choose light-weight architectures, which often need a separate fixed-size model for each desirable computational budget, and may lose performance in case of heavy compression. This paper proposes an effective dynamic inference approach, called E-LANG, which distributes the inference between large accurate Super-models and light-weight Swift models. To this end, a decision making module routes the inputs to Super or Swift models based on the energy characteristics of the representations in the latent space. This method is easily adoptable and architecture agnostic. As such, it can be applied to black-box pre-trained models without a need for architectural manipulations, reassembling of modules, or re-training. Unlike existing methods that are only applicable to encoder-only backbones and classification tasks, our method also works for encoder-decoder structures and sequence-to-sequence tasks such as translation. The E-LANG performance is verified through a set of experiments with T5 and BERT backbones on GLUE, SuperGLUE, and WMT. In particular, we outperform T5-11B with an average computations speed-up of 3.3$\times$ on GLUE and 2.9$\times$ on SuperGLUE. We also achieve BERT-based SOTA on GLUE with 3.2$\times$ less computations. Code and demo are available in the supplementary materials.
Facial biometrics has been recently received tremendous attention as a convenient replacement for traditional authentication systems. Consequently, detecting malicious attempts has found great significance, leading to extensive studies in face anti-spoofing~(FAS),i.e., face presentation attack detection. Deep feature learning and techniques, as opposed to hand-crafted features, have promised a dramatic increase in the FAS systems' accuracy, tackling the key challenges of materializing the real-world application of such systems. Hence, a new research area dealing with the development of more generalized as well as accurate models is increasingly attracting the attention of the research community and industry. In this paper, we present a comprehensive survey on the literature related to deep-feature-based FAS methods since 2017. To shed light on this topic, a semantic taxonomy based on various features and learning methodologies is represented. Further, we cover predominant public datasets for FAS in chronological order, their evolutional progress, and the evaluation criteria (both intra-dataset and inter-dataset). Finally, we discuss the open research challenges and future directions.
State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.
Driven by deep learning techniques and large-scale datasets, recent years have witnessed a paradigm shift in automatic lip reading. While the main thrust of Visual Speech Recognition (VSR) was improving accuracy of Audio Speech Recognition systems, other potential applications, such as biometric identification, and the promised gains of VSR systems, have motivated extensive efforts on developing the lip reading technology. This paper provides a comprehensive survey of the state-of-the-art deep learning based VSR research with a focus on data challenges, task-specific complications, and the corresponding solutions. Advancements in these directions will expedite the transformation of silent speech interface from theory to practice. We also discuss the main modules of a VSR pipeline and the influential datasets. Finally, we introduce some typical VSR application concerns and impediments to real-world scenarios as well as future research directions.
Credit scoring models, which are among the most potent risk management tools that banks and financial institutes rely on, have been a popular subject for research in the past few decades. Accordingly, many approaches have been developed to address the challenges in classifying loan applicants and improve and facilitate decision-making. The imbalanced nature of credit scoring datasets, as well as the heterogeneous nature of features in credit scoring datasets, pose difficulties in developing and implementing effective credit scoring models, targeting the generalization power of classification models on unseen data. In this paper, we propose the Bagging Supervised Autoencoder Classifier (BSAC) that mainly leverages the superior performance of the Supervised Autoencoder, which learns low-dimensional embeddings of the input data exclusively with regards to the ultimate classification task of credit scoring, based on the principles of multi-task learning. BSAC also addresses the data imbalance problem by employing a variant of the Bagging process based on the undersampling of the majority class. The obtained results from our experiments on the benchmark and real-life credit scoring datasets illustrate the robustness and effectiveness of the Bagging Supervised Autoencoder Classifier in the classification of loan applicants that can be regarded as a positive development in credit scoring models.