Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

MGPSN: Motion-Guided Pseudo Siamese Network for Indoor Video Head Detection

Oct 07, 2021
Kailai Sun, Xiaoteng Ma, Qianchuan Zhao, Peng Liu

Head detection in real-world videos is an important research topic in computer vision. However, existing studies face some challenges in complex scenes. The performance of head detectors deteriorates when objects which have similar head appearance exist for indoor videos. Moreover, heads have small scales and diverse poses, which increases the difficulty in detection. To handle these issues, we propose Motion-Guided Pseudo Siamese Network for Indoor Video Head Detection (MGPSN), an end-to-end model to learn the robust head motion features. MGPSN integrates spatial-temporal information on pixel level, guiding the model to extract effective head features. Experiments show that MGPSN is able to suppress static objects and enhance motion instances. Compared with previous methods, it achieves state-of-the-art performance on the crowd Brainwash dataset. Different backbone networks and detectors are evaluated to verify the flexibility and generality of MGPSN.


  Access Paper or Ask Questions

Co-design Optimization for Underwater Vehicle Docking Systems

Aug 06, 2021
Jonathan Wallen, Maddyson Jeske, Zhuoyuan Song

The design of autonomous underwater vehicles (AUVs) and their docking stations has been a popular research topic for several decades. Although many AUV and dock designs have been proposed, materialized, and commercialized, most of these existing designs prioritize the functionality of the AUV over the dock, or vise versa; there has been limited formal research in analytical optimization for AUV docking systems. In this paper, a multidisciplinary optimization framework is presented with the aim to fill this theoretical gap. We propose a co-design optimization method that optimizes multiple design parameters governing the archetype of an AUV and its docking system. Capturing the user design intents in the optimization process, the proposed method produces a set of optimal design parameters that satisfies a set of predefined bounds, constraints, and initial conditions. Three cases of design optimization are reported for different design intents. Each optimal design found in the three cases is compared to an existing system to show the validity of this design optimization framework.

* 7 pages, 6 figures 

  Access Paper or Ask Questions

Evaluating Sensitivity to the Stick-Breaking Prior in Bayesian Nonparametrics

Jul 12, 2021
Ryan Giordano, Runjing Liu, Michael I. Jordan, Tamara Broderick

Bayesian models based on the Dirichlet process and other stick-breaking priors have been proposed as core ingredients for clustering, topic modeling, and other unsupervised learning tasks. Prior specification is, however, relatively difficult for such models, given that their flexibility implies that the consequences of prior choices are often relatively opaque. Moreover, these choices can have a substantial effect on posterior inferences. Thus, considerations of robustness need to go hand in hand with nonparametric modeling. In the current paper, we tackle this challenge by exploiting the fact that variational Bayesian methods, in addition to having computational advantages in fitting complex nonparametric models, also yield sensitivities with respect to parametric and nonparametric aspects of Bayesian models. In particular, we demonstrate how to assess the sensitivity of conclusions to the choice of concentration parameter and stick-breaking distribution for inferences under Dirichlet process mixtures and related mixture models. We provide both theoretical and empirical support for our variational approach to Bayesian sensitivity analysis.

* 65 pages, 20 figures 

  Access Paper or Ask Questions

Exploring Context Modeling Techniques on the Spatiotemporal Crowd Flow Prediction

Jun 30, 2021
Liyue Chen, Leye Wang

In the big data and AI era, context is widely exploited as extra information which makes it easier to learn a more complex pattern in machine learning systems. However, most of the existing related studies seldom take context into account. The difficulty lies in the unknown generalization ability of both context and its modeling techniques across different scenarios. To fill the above gaps, we conduct a large-scale analytical and empirical study on the spatiotemporal crowd prediction (STCFP) problem that is a widely-studied and hot research topic. We mainly make three efforts:(i) we develop new taxonomy about both context features and context modeling techniques based on extensive investigations in prevailing STCFP research; (ii) we conduct extensive experiments on seven datasets with hundreds of millions of records to quantitatively evaluate the generalization ability of both distinct context features and context modeling techniques; (iii) we summarize some guidelines for researchers to conveniently utilize context in diverse applications.


  Access Paper or Ask Questions

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Jun 08, 2021
Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting.We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa.

* Findings of ACL 2021 

  Access Paper or Ask Questions

Adaptive Illumination based Depth Sensing using Deep Learning

Mar 23, 2021
Qiqin Dai, Fengqiang Li, Oliver Cossairt, Aggelos K Katsaggelos

Dense depth map capture is challenging in existing active sparse illumination based depth acquisition techniques, such as LiDAR. Various techniques have been proposed to estimate a dense depth map based on fusion of the sparse depth map measurement with the RGB image. Recent advances in hardware enable adaptive depth measurements resulting in further improvement of the dense depth map estimation. In this paper, we study the topic of estimating dense depth from depth sampling. The adaptive sparse depth sampling network is jointly trained with a fusion network of an RGB image and sparse depth, to generate optimal adaptive sampling masks. We show that such adaptive sampling masks can generalize well to many RGB and sparse depth fusion algorithms under a variety of sampling rates (as low as $0.0625\%$). The proposed adaptive sampling method is fully differentiable and flexible to be trained end-to-end with upstream perception algorithms.


  Access Paper or Ask Questions

Multichannel-based learning for audio object extraction

Feb 11, 2021
Daniel Arteaga, Jordi Pons

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata. While rendering an object-based production into a multichannel mix is straightforward, the reverse process involves sound source separation and estimating the spatial trajectories of the extracted sources. Besides, cinematic object-based productions are often composed by dozens of simultaneous audio objects, which poses a scalability challenge for audio object extraction. Here, we propose a novel deep learning approach to object extraction that learns from the multichannel renders of object-based productions, instead of directly learning from the audio objects themselves. This approach allows tackling the object scalability challenge and also offers the possibility to formulate the problem in a supervised or an unsupervised fashion. Since, to our knowledge, no other works have previously addressed this topic, we first define the task and propose an evaluation methodology, and then discuss under what circumstances our methods outperform the proposed baselines.

* In proceedings of ICASSP2021 

  Access Paper or Ask Questions

Magnification Generalization for Histopathology Image Embedding

Jan 18, 2021
Milad Sikaroudi, Benyamin Ghojogh, Fakhri Karray, Mark Crowley, H. R. Tizhoosh

Histopathology image embedding is an active research area in computer vision. Most of the embedding models exclusively concentrate on a specific magnification level. However, a useful task in histopathology embedding is to train an embedding space regardless of the magnification level. Two main approaches for tackling this goal are domain adaptation and domain generalization, where the target magnification levels may or may not be introduced to the model in training, respectively. Although magnification adaptation is a well-studied topic in the literature, this paper, to the best of our knowledge, is the first work on magnification generalization for histopathology image embedding. We use an episodic trainable domain generalization technique for magnification generalization, namely Model Agnostic Learning of Semantic Features (MASF), which works based on the Model Agnostic Meta-Learning (MAML) concept. Our experimental results on a breast cancer histopathology dataset with four different magnification levels show the proposed method's effectiveness for magnification generalization.

* Accepted for presentation at International Symposium on Biomedical Imaging (ISBI'2021) 

  Access Paper or Ask Questions

Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides

Jan 02, 2021
Amirreza Shirani, Giai Tran, Hieu Trinh, Franck Dernoncourt, Nedim Lipka, Paul Asente, Jose Echevarria, Thamar Solorio

Presentation slides have become a common addition to the teaching material. Emphasizing strong leading words in presentation slides can allow the audience to direct the eye to certain focal points instead of reading the entire slide, retaining the attention to the speaker during the presentation. Despite a large volume of studies on automatic slide generation, few studies have addressed the automation of design assistance during the creation process. Motivated by this demand, we study the problem of Emphasis Selection (ES) in presentation slides, i.e., choosing candidates for emphasis, by introducing a new dataset containing presentation slides with a wide variety of topics, each is annotated with emphasis words in a crowdsourced setting. We evaluate a range of state-of-the-art models on this novel dataset by organizing a shared task and inviting multiple researchers to model emphasis in this new domain. We present the main findings and compare the results of these models, and by examining the challenges of the dataset, we provide different analysis components.

* In Proceedings of Content Authoring and Design (CAD21) workshop at the Thirty-fifth AAAI Conference on Artificial Intelligence (AAAI-21) 

  Access Paper or Ask Questions

<<
332
333
334
335
336
337
338
339
340
341
342
343
344
>>