Safety is an important topic in autonomous driving since any collision may cause serious damage to people and the environment. Hamilton-Jacobi (HJ) Reachability is a formal method that verifies safety in multi-agent interaction and provides a safety controller for collision avoidance. However, due to the worst-case assumption on the car's future actions, reachability might result in too much conservatism such that the normal operation of the vehicle is largely hindered. In this paper, we leverage the power of trajectory prediction, and propose a prediction-based reachability framework for the safety controller. Instead of always assuming for the worst-case, we first cluster the car's behaviors into multiple driving modes, e.g. left turn or right turn. Under each mode, a reachability-based safety controller is designed based on a less conservative action set. For online purpose, we first utilize the trajectory prediction and our proposed mode classifier to predict the possible modes, and then deploy the corresponding safety controller. Through simulations in a T-intersection and an 8-way roundabout, we demonstrate that our prediction-based reachability method largely avoids collision between two interacting cars and reduces the conservatism that the safety controller brings to the car's original operations.
Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available https://github.com/deepinsight/insightface/tree/master/recognition/partial_fc.
Bayesian nonparametrics based on completely random measures (CRMs) offers a flexible modeling approach when the number of clusters or latent components in a dataset is unknown. However, managing the infinite dimensionality of CRMs often leads to slow computation. Practical inference typically relies on either integrating out the infinite-dimensional parameter or using a finite approximation: a truncated finite approximation (TFA) or an independent finite approximation (IFA). The atom weights of TFAs are constructed sequentially, while the atoms of IFAs are independent, which (1) make them well-suited for parallel and distributed computation and (2) facilitates more convenient inference schemes. While IFAs have been developed in certain special cases in the past, there has not yet been a general template for construction or a systematic comparison to TFAs. We show how to construct IFAs for approximating distributions in a large family of CRMs, encompassing all those typically used in practice. We quantify the approximation error between IFAs and the target nonparametric prior, and prove that, in the worst-case, TFAs provide more component-efficient approximations than IFAs. However, in experiments on image denoising and topic modeling tasks with real data, we find that the error of Bayesian approximation methods overwhelms any finite approximation error, and IFAs perform very similarly to TFAs.
Campaign is a frequently employed instrument in lifting up the GMV (Gross Merchandise Volume) of retailer in traditional marketing. As its counterpart in online context, digital-marketing-campaign (DMC) has being trending in recent years with the rapid development of the e-commerce. However, how to empower massive sellers on the online retailing platform the capacity of applying combined multiple digital marketing campaigns to boost their shops' revenue, is still a novel topic. In this work, a comprehensive solution of generating optimized combined multiple DMCs is presented. Firstly, a potential personalized DMC pool is generated for every retailer by a newly proposed neural network model, i.e. the DMCNet (Digital-Marketing-Campaign Net). Secondly, based on the sub-modular optimization theory and the DMC pool by DMCNet, the generated combined multiple DMCs are ranked with respect to their revenue generation strength then the top three ranked campaigns are returned to the sellers' back-end management system, so that retailers can set combined multiple DMCs for their online shops just in one-shot. Real online A/B-test shows that with the integrated solution, sellers of the online retailing platform increase their shops' GMVs with approximately 6$\%$.
As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection. However, it is extremely challenging since recent advances are able to forge faces beyond the perception ability of human eyes, especially in compressed images and videos. We find that mining forgery patterns with the awareness of frequency could be a cure, as frequency provides a complementary viewpoint where either subtle forgery artifacts or compression errors could be well described. To introduce frequency into the face forgery detection, we propose a novel Frequency in Face Forgery Network (F3-Net), taking advantages of two different but complementary frequency-aware clues, 1) frequency-aware decomposed image components, and 2) local frequency statistics, to deeply mine the forgery patterns via our two-stream collaborative learning framework. We apply DCT as the applied frequency-domain transformation. Through comprehensive studies, we show that the proposed F3-Net significantly outperforms competing state-of-the-art methods on all compression qualities in the challenging FaceForensics++ dataset, especially wins a big lead upon low-quality media.
The hacktivist group Anonymous is unusual in its public-facing nature. Unlike other cybercriminal groups, which rely on secrecy and privacy for protection, Anonymous is prevalent on the social media site, Twitter. In this paper we re-examine some key findings reported in previous small-scale qualitative studies of the group using a large-scale computational analysis of Anonymous' presence on Twitter. We specifically refer to reports which reject the group's claims of leaderlessness, and indicate a fracturing of the group after the arrests of prominent members in 2011-2013. In our research, we present the first attempts to use machine learning to identify and analyse the presence of a network of over 20,000 Anonymous accounts spanning from 2008-2019 on the Twitter platform. In turn, this research utilises social network analysis (SNA) and centrality measures to examine the distribution of influence within this large network, identifying the presence of a small number of highly influential accounts. Moreover, we present the first study of tweets from some of the identified key influencer accounts and, through the use of topic modelling, demonstrate a similarity in overarching subjects of discussion between these prominent accounts. These findings provide robust, quantitative evidence to support the claims of smaller-scale, qualitative studies of the Anonymous collective.
Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. Therefore, identifying media framing is a crucial step to understanding how news media influence the public. Framing is, however, difficult to operationalize and detect, and thus traditional media framing studies had to rely on manual annotation, which is challenging to scale up to massive news datasets. Here, by developing a media frame classifier that achieves state-of-the-art performance, we systematically analyze the media frames of 1.5 million New York Times articles published from 2000 to 2017. By examining the ebb and flow of media frames over almost two decades, we show that short-term frame abundance fluctuation closely corresponds to major events, while there also exist several long-term trends, such as the gradually increasing prevalence of the ``Cultural identity'' frame. By examining specific topics and sentiments, we identify characteristics and dynamics of each frame. Finally, as a case study, we delve into the framing of mass shootings, revealing three major framing patterns. Our scalable, computational approach to massive news datasets opens up new pathways for systematic media framing studies.
Human parsing, or human body part semantic segmentation, has been an active research topic due to its wide potential applications. In this paper, we propose a novel GRAph PYramid Mutual Learning (Grapy-ML) method to address the cross-dataset human parsing problem, where the annotations are at different granularities. Starting from the prior knowledge of the human body hierarchical structure, we devise a graph pyramid module (GPM) by stacking three levels of graph structures from coarse granularity to fine granularity subsequently. At each level, GPM utilizes the self-attention mechanism to model the correlations between context nodes. Then, it adopts a top-down mechanism to progressively refine the hierarchical features through all the levels. GPM also enables efficient mutual learning. Specifically, the network weights of the first two levels are shared to exchange the learned coarse-granularity information across different datasets. By making use of the multi-granularity labels, Grapy-ML learns a more discriminative feature representation and achieves state-of-the-art performance, which is demonstrated by extensive experiments on the three popular benchmarks, e.g. CIHP dataset. The source code is publicly available at https://github.com/Charleshhy/Grapy-ML.
Utterance-level emotion recognition (ULER) is a significant research topic for understanding human behaviors and developing empathetic chatting machines in the artificial intelligence area. Unlike traditional text classification problem, this task is supported by a limited number of datasets, among which most contain inadequate conversations or speeches. Such a data scarcity issue limits the possibility of training larger and more powerful models for this task. Witnessing the success of transfer learning in natural language process (NLP), we propose to pre-train a context-dependent encoder (CoDE) for ULER by learning from unlabeled conversation data. Essentially, CoDE is a hierarchical architecture that contains an utterance encoder and a conversation encoder, making it different from those works that aim to pre-train a universal sentence encoder. Also, we propose a new pre-training task named "conversation completion" (CoCo), which attempts to select the correct answer from candidate answers to fill a masked utterance in a question conversation. The CoCo task is carried out on pure movie subtitles so that our CoDE can be pre-trained in an unsupervised fashion. Finally, the pre-trained CoDE (PT-CoDE) is fine-tuned for ULER and boosts the model performance significantly on five datasets.
Automatic License Plate detection and Recognition (ALPR) is a quite popular and active research topic in the field of computer vision, image processing and intelligent transport systems. ALPR is used to make detection and recognition processes more robust and efficient in highly complicated environments and backgrounds. Several research investigations are still necessary due to some constraints such as: completeness of numbering systems of countries, different colors, various languages, multiple sizes and varied fonts. For this, we present in this paper an automatic framework for License Plate (LP) detection and recognition from complex scenes. Our framework is based on mask region convolutional neural networks used for LP detection, segmentation and recognition. Although some studies have focused on LP detection, LP recognition, LP segmentation or just two of them, our study uses the maskr-cnn in the three stages. The evaluation of our framework is enhanced by four datasets for different countries and consequently with various languages. In fact, it tested on four datasets including images captured from multiple scenes under numerous conditions such as varied orientation, poor quality images, blurred images and complex environmental backgrounds. Extensive experiments show the robustness and efficiency of our suggested framework in all datasets.