We present Burst2Vec, our multi-task learning approach to predict emotion, age, and origin (i.e., native country/language) from vocal bursts. Burst2Vec utilises pre-trained speech representations to capture acoustic information from raw waveforms and incorporates the concept of model debiasing via adversarial training. Our models achieve a relative 30 % performance gain over baselines using pre-extracted features and score the highest amongst all participants in the ICML ExVo 2022 Multi-Task Challenge.
In Cultural Heritage, hyperspectral images are commonly used since they provide extended information regarding the optical properties of materials. Thus, the processing of such high-dimensional data becomes challenging from the perspective of machine learning techniques to be applied. In this paper, we propose a Rank-$R$ tensor-based learning model to identify and classify material defects on Cultural Heritage monuments. In contrast to conventional deep learning approaches, the proposed high order tensor-based learning demonstrates greater accuracy and robustness against overfitting. Experimental results on real-world data from UNESCO protected areas indicate the superiority of the proposed scheme compared to conventional deep learning models.
COVID-19 vaccine hesitancy is widespread, despite governments' information campaigns and WHO efforts. One of the reasons behind this is vaccine disinformation which widely spreads in social media. In particular, recent surveys have established that vaccine disinformation is impacting negatively citizen trust in COVID-19 vaccination. At the same time, fact-checkers are struggling with detecting and tracking of vaccine disinformation, due to the large scale of social media. To assist fact-checkers in monitoring vaccine narratives online, this paper studies a new vaccine narrative classification task, which categorises COVID-19 vaccine claims into one of seven categories. Following a data augmentation approach, we first construct a novel dataset for this new classification task, focusing on the minority classes. We also make use of fact-checker annotated data. The paper also presents a neural vaccine narrative classifier that achieves an accuracy of 84% under cross-validation. The classifier is publicly available for researchers and journalists.
Accurate and cost effective mapping of water bodies has an enormous significance for environmental understanding and navigation. However, the quantity and quality of information we acquire from such environmental features is limited by various factors, including cost, time, security, and the capabilities of existing data collection techniques. Measurement of water depth is an important part of such mapping, particularly in shallow locations that could provide navigational risk or have important ecological functions. Erosion and deposition at these locations, for example, due to storms and erosion, can cause rapid changes that require repeated measurements. In this paper, we describe a low-cost, resilient, unmanned autonomous surface vehicle for bathymetry data collection using side-scan sonar. We discuss the adaptation of equipment and sensors for the collection of navigation, control, and bathymetry data and also give an overview of the vehicle setup. This autonomous surface vehicle has been used to collect bathymetry from the Powai Lake in Mumbai, India.
The covid19 pandemic is a global emergency that badly impacted the economies of various countries. Covid19 hit India when the growth rate of the country was at the lowest in the last 10 years. To semantically analyze the impact of this pandemic on the economy, it is curial to have an ontology. CIDO ontology is a well standardized ontology that is specially designed to assess the impact of coronavirus disease and utilize its results for future decision forecasting for the government, industry experts, and professionals in the field of various domains like research, medical advancement, technical innovative adoptions, and so on. However, this ontology does not analyze the impact of the Covid19 pandemic on the Indian banking sector. On the other side, Covid19IBO ontology has been developed to analyze the impact of the Covid19 pandemic on the Indian banking sector but this ontology does not reflect complete information of Covid19 data. Resultantly, users cannot get all the relevant information about Covid19 and its impact on the Indian economy. This article aims to extend the CIDO ontology to show the impact of Covid19 on the Indian economy sector by reusing the concepts from other data sources. We also provide a simplified schema matching approach that detects the overlapping information among the ontologies. The experimental analysis proves that the proposed approach has reasonable results.
Face clustering is a promising way to scale up face recognition systems using large-scale unlabeled face images. It remains challenging to identify small or sparse face image clusters that we call hard clusters, which is caused by the heterogeneity, \ie, high variations in size and sparsity, of the clusters. Consequently, the conventional way of using a uniform threshold (to identify clusters) often leads to a terrible misclassification for the samples that should belong to hard clusters. We tackle this problem by leveraging the neighborhood information of samples and inferring the cluster memberships (of samples) in a probabilistic way. We introduce two novel modules, Neighborhood-Diffusion-based Density (NDDe) and Transition-Probability-based Distance (TPDi), based on which we can simply apply the standard Density Peak Clustering algorithm with a uniform threshold. Our experiments on multiple benchmarks show that each module contributes to the final performance of our method, and by incorporating them into other advanced face clustering methods, these two modules can boost the performance of these methods to a new state-of-the-art. Code is available at: https://github.com/echoanran/On-Mitigating-Hard-Clusters.
Modern neural networks use building blocks such as convolutions that are equivariant to arbitrary 2D translations. However, these vanilla blocks are not equivariant to arbitrary 3D translations in the projective manifold. Even then, all monocular 3D detectors use vanilla blocks to obtain the 3D coordinates, a task for which the vanilla blocks are not designed for. This paper takes the first step towards convolutions equivariant to arbitrary 3D translations in the projective manifold. Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not. The additional depth equivariance forces the DEVIANT to learn consistent depth estimates, and therefore, DEVIANT achieves state-of-the-art monocular 3D detection results on KITTI and Waymo datasets in the image-only category and performs competitively to methods using extra information. Moreover, DEVIANT works better than vanilla networks in cross-dataset evaluation. Code and models at https://github.com/abhi1kumar/DEVIANT
Video text spotting(VTS) is the task that requires simultaneously detecting, tracking and recognizing text in the video. Existing video text spotting methods typically develop sophisticated pipelines and multiple models, which is not friend for real-time applications. Here we propose a real-time end-to-end video text spotter with Contrastive Representation learning (CoText). Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e.g., text detection, tracking, recognition) in a real-time end-to-end trainable framework. 2) With contrastive learning, CoText models long-range dependencies and learning temporal information across multiple frames. 3) A simple, lightweight architecture is designed for effective and accurate performance, including GPU-parallel detection post-processing, CTC-based recognition head with Masked RoI. Extensive experiments show the superiority of our method. Especially, CoText achieves an video text spotting IDF1 of 72.0% at 41.0 FPS on ICDAR2015video, with 10.5% and 32.0 FPS improvement the previous best method. The code can be found at github.com/weijiawu/CoText.
The use of attributed quotes is the most direct and least filtered pathway of information propagation in news. Consequently, quotes play a central role in the conception, reception, and analysis of news stories. Since quotes provide a more direct window into a speaker's mind than regular reporting, they are a valuable resource for journalists and researchers alike. While substantial research efforts have been devoted to methods for the automated extraction of quotes from news and their attribution to speakers, few comprehensive corpora of attributed quotes from contemporary sources are available to the public. Here, we present an adaptive web interface for searching Quotebank, a massive collection of quotes from the news, which we make available at https://quotebank.dlab.tools.
The nanoparticle size and distribution information in the SEM images of silicon crystals are generally counted by manual methods. The realization of automatic machine recognition is significant in materials science. This paper proposed a superposition partitioning image recognition method to realize automatic recognition and information statistics of silicon crystal nanoparticle SEM images. Especially for the complex and highly aggregated characteristics of silicon crystal particle size, an accurate recognition step and contour statistics method based on morphological processing are given. This method has technical reference value for the recognition of Monocrystalline silicon surface nanoparticle images under different SEM shooting conditions. Besides, it outperforms other methods in terms of recognition accuracy and algorithm efficiency.