Semi-supervised clustering is an very important topic in machine learning and computer vision. The key challenge of this problem is how to learn a metric, such that the instances sharing the same label are more likely close to each other on the embedded space. However, little attention has been paid to learn better representations when the data lie on non-linear manifold. Fortunately, deep learning has led to great success on feature learning recently. Inspired by the advances of deep learning, we propose a deep transductive semi-supervised maximum margin clustering approach. More specifically, given pairwise constraints, we exploit both labeled and unlabeled data to learn a non-linear mapping under maximum margin framework for clustering analysis. Thus, our model unifies transductive learning, feature learning and maximum margin techniques in the semi-supervised clustering framework. We pretrain the deep network structure with restricted Boltzmann machines (RBMs) layer by layer greedily, and optimize our objective function with gradient descent. By checking the most violated constraints, our approach updates the model parameters through error backpropagation, in which deep features are learned automatically. The experimental results shows that our model is significantly better than the state of the art on semi-supervised clustering.
Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this paper aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) depict a roadmap to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been rather exaggerated: although social media may provide a glimpse on electoral outcomes current research does not provide strong evidence to support it can replace traditional polls. Finally, future lines of research along with a set of requirements they must fulfill are provided.
SVMs were initially developed to perform binary classification; though, applications of binary classification are very limited. Most of the practical applications involve multiclass classification, especially in remote sensing land cover classification. A number of methods have been proposed to implement SVMs to produce multiclass classification. A number of methods to generate multiclass SVMs from binary SVMs have been proposed by researchers and is still a continuing research topic. This paper compares the performance of six multi-class approaches to solve classification problem with remote sensing data in term of classification accuracy and computational cost. One vs. one, one vs. rest, Directed Acyclic Graph (DAG), and Error Corrected Output Coding (ECOC) based multiclass approaches creates many binary classifiers and combines their results to determine the class label of a test pixel. Another catogery of multi class approach modify the binary class objective function and allows simultaneous computation of multiclass classification by solving a single optimisation problem. Results from this study conclude the usefulness of One vs. One multi class approach in term of accuracy and computational cost over other multi class approaches.
The emotional state of a speaker can be influenced by many different factors in dialogues, such as dialogue scene, dialogue topic, and interlocutor stimulus. The currently available data resources to support such multimodal affective analysis in dialogues are however limited in scale and diversity. In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances. M3 ED is annotated with 7 emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral) at utterance level, and encompasses acoustic, visual, and textual modalities. To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese. It is valuable for cross-culture emotion analysis and recognition. We apply several state-of-the-art methods on the M3ED dataset to verify the validity and quality of the dataset. We also propose a general Multimodal Dialogue-aware Interaction framework, MDI, to model the dialogue context for emotion recognition, which achieves comparable performance to the state-of-the-art methods on the M3ED. The full dataset and codes are available.
Where am I? This is one of the most critical questions that any intelligent system should answer to decide whether it navigates to a previously visited area. This problem has long been acknowledged for its challenging nature in simultaneous localization and mapping (SLAM), wherein the robot needs to correctly associate the incoming sensory data to the database allowing consistent map generation. The significant advances in computer vision achieved over the last 20 years, the increased computational power, and the growing demand for long-term exploration contributed to efficiently performing such a complex task with inexpensive perception sensors. In this article, visual loop closure detection, which formulates a solution based solely on appearance input data, is surveyed. We start by briefly introducing place recognition and SLAM concepts in robotics. Then, we describe a loop closure detection system's structure, covering an extensive collection of topics, including the feature extraction, the environment representation, the decision-making step, and the evaluation process. We conclude by discussing open and new research challenges, particularly concerning the robustness in dynamic environments, the computational complexity, and scalability in long-term operations. The article aims to serve as a tutorial and a position paper for newcomers to visual loop closure detection.
The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.
This article is a gentle discussion about the field of reinforcement learning for real life, about opportunities and challenges, with perspectives and without technical details, touching a broad range of topics. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, and books. Various groups of readers, like researchers, engineers, students, managers, investors, officers, and people wanting to know more about the field, may find the article interesting. In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI. Then we discuss opportunities of RL, in particular, applications in products and services, games, recommender systems, robotics, transportation, economics and finance, healthcare, education, combinatorial optimization, computer systems, and science and engineering. The we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) model, simulation, planning, and benchmarks, 5) learning to learn a.k.a. meta-learning, 6) off-policy/offline learning, 7) software development and deployment, 8) business perspectives, and 9) more challenges. We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".
With its powerful capability to deal with graph data widely found in practical applications, graph neural networks (GNNs) have received significant research attention. However, as societies become increasingly concerned with data privacy, GNNs face the need to adapt to this new normal. This has led to the rapid development of federated graph neural networks (FedGNNs) research in recent years. Although promising, this interdisciplinary field is highly challenging for interested researchers to enter into. The lack of an insightful survey on this topic only exacerbates this problem. In this paper, we bridge this gap by offering a comprehensive survey of this emerging field. We propose a unique 3-tiered taxonomy of the FedGNNs literature to provide a clear view into how GNNs work in the context of Federated Learning (FL). It puts existing works into perspective by analyzing how graph data manifest themselves in FL settings, how GNN training is performed under different FL system architectures and degrees of graph data overlap across data silo, and how GNN aggregation is performed under various FL settings. Through discussions of the advantages and limitations of existing works, we envision future research directions that can help build more robust, dynamic, efficient, and interpretable FedGNNs.
Structure information extraction refers to the task of extracting structured text fields from web pages, such as extracting a product offer from a shopping page including product title, description, brand and price. It is an important research topic which has been widely studied in document understanding and web search. Recent natural language models with sequence modeling have demonstrated state-of-the-art performance on web information extraction. However, effectively serializing tokens from unstructured web pages is challenging in practice due to a variety of web layout patterns. Limited work has focused on modeling the web layout for extracting the text fields. In this paper, we introduce WebFormer, a Web-page transFormer model for structure information extraction from web documents. First, we design HTML tokens for each DOM node in the HTML by embedding representations from their neighboring tokens through graph attention. Second, we construct rich attention patterns between HTML tokens and text tokens, which leverages the web layout for effective attention weight computation. We conduct an extensive set of experiments on SWDE and Common Crawl benchmarks. Experimental results demonstrate the superior performance of the proposed approach over several state-of-the-art methods.
The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this challenge by leveraging state models to partition the state space and assist the test generation process. Since not all states are equally important and fuzzing campaigns have time limits, fuzzers need effective state selection algorithms to prioritize progressive states over others. Several state selection algorithms have been proposed but they were implemented and evaluated separately on different platforms, making it hard to achieve conclusive findings. In this work, we evaluate an extensive set of state selection algorithms on the same fuzzing platform that is AFLNet, a state-of-the-art fuzzer for network servers. The algorithm set includes existing ones supported by AFLNet and our novel and principled algorithm called AFLNetLegion. The experimental results on the ProFuzzBench benchmark show that (i) the existing state selection algorithms of AFLNet achieve very similar code coverage, (ii) AFLNetLegion clearly outperforms these algorithms in selected case studies, but (iii) the overall improvement appears insignificant. These are unexpected yet interesting findings. We identify problems and share insights that could open opportunities for future research on this topic.