Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

Understanding the Ability of Deep Neural Networks to Count Connected Components in Images

Jan 05, 2021
Shuyue Guan, Murray Loew

Figure 1 for Understanding the Ability of Deep Neural Networks to Count Connected Components in Images

Figure 2 for Understanding the Ability of Deep Neural Networks to Count Connected Components in Images

Figure 3 for Understanding the Ability of Deep Neural Networks to Count Connected Components in Images

Figure 4 for Understanding the Ability of Deep Neural Networks to Count Connected Components in Images

Humans can count very fast by subitizing, but slow substantially as the number of objects increases. Previous studies have shown a trained deep neural network (DNN) detector can count the number of objects in an amount of time that increases slowly with the number of objects. Such a phenomenon suggests the subitizing ability of DNNs, and unlike humans, it works equally well for large numbers. Many existing studies have successfully applied DNNs to object counting, but few studies have studied the subitizing ability of DNNs and its interpretation. In this paper, we found DNNs do not have the ability to generally count connected components. We provided experiments to support our conclusions and explanations to understand the results and phenomena of these experiments. We proposed three ML-learnable characteristics to verify learnable problems for ML models, such as DNNs, and explain why DNNs work for specific counting problems but cannot generally count connected components.

* 7 pages, 12 figures. Accepted by IEEE AIPR 2020 (Oral)

Via

Access Paper or Ask Questions

FeedRec: News Feed Recommendation with Various User Feedbacks

Feb 09, 2021
Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang

Figure 1 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 2 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 3 for FeedRec: News Feed Recommendation with Various User Feedbacks

Figure 4 for FeedRec: News Feed Recommendation with Various User Feedbacks

Personalized news recommendation techniques are widely adopted by many online news feed platforms to target user interests. Learning accurate user interest models is important for news recommendation. Most existing methods for news recommendation rely on implicit feedbacks like click behaviors for inferring user interests and model training. However, click behaviors are implicit feedbacks and usually contain heavy noise. In addition, they cannot help infer complicated user interest such as dislike. Besides, the feed recommendation models trained solely on click behaviors cannot optimize other objectives such as user engagement. In this paper, we present a news feed recommendation method that can exploit various kinds of user feedbacks to enhance both user interest modeling and recommendation model training. In our method we propose a unified user modeling framework to incorporate various explicit and implicit user feedbacks to infer both positive and negative user interests. In addition, we propose a strong-to-weak attention network that uses the representations of stronger feedbacks to distill positive and negative user interests from implicit weak feedbacks for accurate user interest modeling. Besides, we propose a multi-feedback model training framework by jointly training the model in the click, finish and dwell time prediction tasks to learn an engagement-aware feed recommendation model. Extensive experiments on real-world dataset show that our approach can effectively improve the model performance in terms of both news clicks and user engagement.

Via

Access Paper or Ask Questions

Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Oct 27, 2020
Raktim Gautam Goswami, Sivaganesh Andhavarapu, K Sri Rama Murty

Figure 1 for Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Figure 2 for Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Figure 3 for Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Figure 4 for Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM

Most of the deep learning based speech enhancement (SE) methods rely on estimating the magnitude spectrum of the clean speech signal from the observed noisy speech signal, either by magnitude spectral masking or regression. These methods reuse the noisy phase while synthesizing the time-domain waveform from the estimated magnitude spectrum. However, there have been recent works highlighting the importance of phase in SE. There was an attempt to estimate the complex ratio mask taking phase into account using complex-valued feed-forward neural network (FFNN). But FFNNs cannot capture the sequential information essential for phase estimation. In this work, we propose a realisation of complex-valued long short-term memory (RCLSTM) network to estimate the complex ratio mask (CRM) using sequential information along time. The proposed RCLSTM is designed to process the complex-valued sequences using complex arithmetic, and hence it preserves the dependencies between the real and imaginary parts of CRM and thereby the phase. The proposed method is evaluated on the noisy speech mixtures formed from the Voice-Bank corpus and DEMAND database. When compared to real value based masking methods, the proposed RCLSTM improves over them in several objective measures including perceptual evaluation of speech quality (PESQ), in which it improves by over 4.3%

Via

Access Paper or Ask Questions

CLOI: An Automated Benchmark Framework For Generating Geometric Digital Twins Of Industrial Facilities

Jan 05, 2021
Eva Agapaki, Ioannis Brilakis

Figure 1 for CLOI: An Automated Benchmark Framework For Generating Geometric Digital Twins Of Industrial Facilities

Figure 2 for CLOI: An Automated Benchmark Framework For Generating Geometric Digital Twins Of Industrial Facilities

Figure 3 for CLOI: An Automated Benchmark Framework For Generating Geometric Digital Twins Of Industrial Facilities

Figure 4 for CLOI: An Automated Benchmark Framework For Generating Geometric Digital Twins Of Industrial Facilities

This paper devises, implements and benchmarks a novel framework, named CLOI, that can accurately generate individual labelled point clusters of the most important shapes of existing industrial facilities with minimal manual effort in a generic point-level format. CLOI employs a combination of deep learning and geometric methods to segment the points into classes and individual instances. The current geometric digital twin generation from point cloud data in commercial software is a tedious, manual process. Experiments with our CLOI framework reveal that the method can reliably segment complex and incomplete point clouds of industrial facilities, yielding 82% class segmentation accuracy. Compared to the current state-of-practice, the proposed framework can realize estimated time-savings of 30% on average. CLOI is the first framework of its kind to have achieved geometric digital twinning for the most important objects of industrial factories. It provides the foundation for further research on the generation of semantically enriched digital twins of the built environment.

Via

Access Paper or Ask Questions

Resource allocation in dynamic multiagent systems

Feb 16, 2021
Niall Creech, Natalia Criado Pacheco, Simon Miles

Figure 1 for Resource allocation in dynamic multiagent systems

Figure 2 for Resource allocation in dynamic multiagent systems

Figure 3 for Resource allocation in dynamic multiagent systems

Figure 4 for Resource allocation in dynamic multiagent systems

Resource allocation and task prioritisation are key problem domains in the fields of autonomous vehicles, networking, and cloud computing. The challenge in developing efficient and robust algorithms comes from the dynamic nature of these systems, with many components communicating and interacting in complex ways. The multi-group resource allocation optimisation (MG-RAO) algorithm we present uses multiple function approximations of resource demand over time, alongside reinforcement learning techniques, to develop a novel method of optimising resource allocation in these multi-agent systems. This method is applicable where there are competing demands for shared resources, or in task prioritisation problems. Evaluation is carried out in a simulated environment containing multiple competing agents. We compare the new algorithm to an approach where child agents distribute their resources uniformly across all the tasks they can be allocated. We also contrast the performance of the algorithm where resource allocation is modelled separately for groups of agents, as to being modelled jointly over all agents. The MG-RAO algorithm shows a 23 - 28% improvement over fixed resource allocation in the simulated environments. Results also show that, in a volatile system, using the MG-RAO algorithm configured so that child agents model resource allocation for all agents as a whole has 46.5% of the performance of when it is set to model multiple groups of agents. These results demonstrate the ability of the algorithm to solve resource allocation problems in multi-agent systems and to perform well in dynamic environments.

* 22 pages

Via

Access Paper or Ask Questions

SCA-Net: A Self-Correcting Two-Layer Autoencoder for Hyper-spectral Unmixing

Feb 16, 2021
Gurpreet Singh, Soumyajit Gupta, Matthew Lease, Clint Dawson

Figure 1 for SCA-Net: A Self-Correcting Two-Layer Autoencoder for Hyper-spectral Unmixing

Figure 2 for SCA-Net: A Self-Correcting Two-Layer Autoencoder for Hyper-spectral Unmixing

Figure 3 for SCA-Net: A Self-Correcting Two-Layer Autoencoder for Hyper-spectral Unmixing

Figure 4 for SCA-Net: A Self-Correcting Two-Layer Autoencoder for Hyper-spectral Unmixing

Linear Mixture Model for hyperspectral datasets involves separating a mixed pixel as a linear combination of its constituent endmembers and corresponding fractional abundances. Both optimization and neural methods have attempted to tackle this problem, with the current state of the art results achieved by neural models on benchmark datasets. However, our review of these neural models show that these networks are severely over-parameterized and consequently the invariant endmember spectra extracted as decoder weights has a high variance over multiple runs. All of these approaches require substantial post-processing to satisfy LMM constraints. Furthermore, they also require an exact specification of the number of endmembers and specialized initialization of weights from other algorithms like VCA. Our work shows for the first time that a two-layer autoencoder (SCA-Net), with $2FK$ parameters ($F$ features, $K$ endmembers), achieves error metrics that are scales apart ($10^{-5})$ from previously reported values $(10^{-2})$. SCA-Net converges to this low error solution starting from a random initialization of weights. We also show that SCA-Net, based upon a bi-orthogonal representation, performs a self-correction when the the number of endmembers are over-specified. We show that our network formulation extracts a low-rank representation that is bounded below by a tail-energy and can be computationally verified. Our numerical experiments on Samson, Jasper, and Urban datasets demonstrate that SCA-Net outperforms previously reported error metrics for all the cases while being robust to noise and outliers.

Via

Access Paper or Ask Questions

Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

Dec 30, 2020
Yaofeng Desmond Zhong, Biswadip Dey, Amit Chakraborty

Figure 1 for Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

Figure 2 for Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

Figure 3 for Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

Figure 4 for Benchmarking Energy-Conserving Neural Networks for Learning Dynamics from Data

The last few years have witnessed an increased interest in incorporating physics-informed inductive bias in deep learning frameworks. In particular, a growing volume of literature has been exploring ways to enforce energy conservation while using neural networks for learning dynamics from observed time-series data. In this work, we present a comparative analysis of the energy-conserving neural networks - for example, deep Lagrangian network, Hamiltonian neural network, etc. - wherein the underlying physics is encoded in their computation graph. We focus on ten neural network models and explain the similarities and differences between the models. We compare their performance in 4 different physical systems. Our result highlights that using a high-dimensional coordinate system and then imposing restrictions via explicit constraints can lead to higher accuracy in the learned dynamics. We also point out the possibility of leveraging some of these energy-conserving models to design energy-based controllers.

Via

Access Paper or Ask Questions

Equality Saturation for Tensor Graph Superoptimization

Jan 05, 2021
Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, Jacques Pienaar

One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing approaches for tensor graph superoptimization both in production and research frameworks apply substitutions in a sequential manner. Such sequential search methods are sensitive to the order in which the substitutions are applied and often only explore a small fragment of the exponential space of equivalent graphs. This paper presents a novel technique for tensor graph superoptimization that employs equality saturation to apply all possible substitutions at once. We show that our approach can find optimized graphs with up to 16% speedup over state-of-the-art, while spending on average 48x less time optimizing.

Via

Access Paper or Ask Questions

Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation

Dec 30, 2020
Debesh Jha, Steven A. Hicks, Krister Emanuelsen, Håvard Johansen, Dag Johansen, Thomas de Lange, Michael A. Riegler, Pål Halvorsen

Figure 1 for Medico Multimedia Task at MediaEval 2020: Automatic Polyp Segmentation

Colorectal cancer is the third most common cause of cancer worldwide. According to Global cancer statistics 2018, the incidence of colorectal cancer is increasing in both developing and developed countries. Early detection of colon anomalies such as polyps is important for cancer prevention, and automatic polyp segmentation can play a crucial role for this. Regardless of the recent advancement in early detection and treatment options, the estimated polyp miss rate is still around 20\%. Support via an automated computer-aided diagnosis system could be one of the potential solutions for the overlooked polyps. Such detection systems can help low-cost design solutions and save doctors time, which they could for example use to perform more patient examinations. In this paper, we introduce the 2020 Medico challenge, provide some information on related work and the dataset, describe the task and evaluation metrics, and discuss the necessity of organizing the Medico challenge.

* MediaEval 2020

Via

Access Paper or Ask Questions

Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Feb 16, 2021
Léo Cances, Etienne Labbé, Thomas Pellegrini

Figure 1 for Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Figure 2 for Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Figure 3 for Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Figure 4 for Improving Deep-learning-based Semi-supervised Audio Tagging with Mixup

Recently, semi-supervised learning (SSL) methods, in the framework of deep learning (DL), have been shown to provide state-of-the-art results on image datasets by exploiting unlabeled data. Most of the time tested on object recognition tasks in images, these algorithms are rarely compared when applied to audio tasks. In this article, we adapted four recent SSL methods to the task of audio tagging. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT) involve two collaborative neural networks. The two other algorithms, called MixMatch (MM) and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide ResNet 28-2 architecture in all our experiments, 10% of labeled data and the remaining 90\% as unlabeled, we first compare the four methods' accuracy on three standard benchmark audio event datasets: Environmental Sound Classification (ESC-10), UrbanSound8K (UBS8K), and Google Speech Commands (GSC). MM and FM outperformed MT and DCT significantly, MM being the best method in most experiments. On UBS8K and GSC, in particular, MM achieved 18.02% and 3.25% error rates (ER), outperforming models trained with 100% of the available labeled data, which reached 23.29% and 4.94% ER, respectively. Second, we explored the benefits of using the mixup augmentation in the four algorithms. In almost all cases, mixup brought significant gains. For instance, on GSC, FM reached 4.44% and 3.31% ER without and with mixup.

* 9 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions