Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Approximate Inference for Stochastic Planning in Factored Spaces

Mar 23, 2022
Zhennan Wu, Roni Khardon

The paper explores the use of approximate inference techniques as solution methods for stochastic planning problems with discrete factored spaces. While much prior work exists on this topic, subtle variations hinder a global understanding of different approaches for their differences and potential advantages. Here we abstract a simple framework that captures and connects prior work along two dimensions, direction of information flow, i.e., forward vs backward inference, and the type of approximation used, e.g., Belief Propagation (BP) vs mean field variational inference (MFVI). Through this analysis we also propose a novel algorithm, CSVI, which provides a tighter variational approximation compared to prior work. An extensive experimental evaluation compares algorithms from different branches of the framework, showing that methods based on BP are generally better than methods based on MFVI, that CSVI is competitive with BP algorithms, and that while inference direction does not show a significant effect for VI methods, forward inference provides stronger performance with BP.

  Access Paper or Ask Questions

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Feb 12, 2022
Bolaji Yusuf, Ankur Gandhe, Alex Sokolov

Improving end-to-end speech recognition by incorporating external text data has been a longstanding research topic. There has been a recent focus on training E2E ASR models that get the performance benefits of external text data without incurring the extra cost of evaluating an external language model at inference time. In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder. When we jointly train ASR and masked language model with the 960-hour Librispeech and Opensubtitles data respectively, we observe WER reductions of 16% and 20% on test-other and test-clean respectively over an ASR-only baseline without any extra cost at inference time, and reductions of 6% and 8% compared to a stronger MUTE-L baseline which trains the decoder with the same text data as our model. We achieve further improvements when we train masked language model on Librispeech data or when we use machine translation as the auxiliary task, without significantly sacrificing performance on the task itself.

* 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022) 

  Access Paper or Ask Questions

What's Cracking? A Review and Analysis of Deep Learning Methods for Structural Crack Segmentation, Detection and Quantification

Feb 08, 2022
Jacob König, Mark Jenkins, Mike Mannion, Peter Barrie, Gordon Morison

Surface cracks are a very common indicator of potential structural faults. Their early detection and monitoring is an important factor in structural health monitoring. Left untreated, they can grow in size over time and require expensive repairs or maintenance. With recent advances in computer vision and deep learning algorithms, the automatic detection and segmentation of cracks for this monitoring process have become a major topic of interest. This review aims to give researchers an overview of the published work within the field of crack analysis algorithms that make use of deep learning. It outlines the various tasks that are solved through applying computer vision algorithms to surface cracks in a structural health monitoring setting and also provides in-depth reviews of recent fully, semi and unsupervised approaches that perform crack classification, detection, segmentation and quantification. Additionally, this review also highlights popular datasets used for cracks and the metrics that are used to evaluate the performance of those algorithms. Finally, potential research gaps are outlined and further research directions are provided.

  Access Paper or Ask Questions

Demanding and Designing Aligned Cognitive Architectures

Dec 19, 2021
Koen Holtman

With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many issues, here we examine three of them. The first issue is to clarify what demands stakeholders might usefully make on the designers of AI systems, useful because the technology exists to implement them. We make this technical topic more accessible by using the framing of cognitive architectures. The second issue is to move beyond an analytical framing that treats useful intelligence as being reward maximization only. To support this move, we define several AI cognitive architectures that combine reward maximization with other technical elements designed to improve alignment. The third issue is how stakeholders should calibrate their interactions with modern machine learning researchers. We consider how current fashions in machine learning create a narrative pull that participants in technical and policy discussions should be aware of, so that they can compensate for it. We identify several technically tractable but currently unfashionable options for improving AI alignment.

* PERLS Workshop at 35th Conference on Neural Information Processing Systems (NeurIPS 2021). This arXiv version extends the workshop camera-ready version by adding four figures 

  Access Paper or Ask Questions

A Grounded Well-being Conversational Agent with Multiple Interaction Modes: Preliminary Results

Nov 28, 2021
Xinxin Yan, Ndapa Nakashole

Technologies for enhancing well-being, healthcare vigilance and monitoring are on the rise. However, despite patient interest, such technologies suffer from low adoption. One hypothesis for this limited adoption is loss of human interaction that is central to doctor-patient encounters. In this paper we seek to address this limitation via a conversational agent that adopts one aspect of in-person doctor-patient interactions: A human avatar to facilitate medical grounded question answering. This is akin to the in-person scenario where the doctor may point to the human body or the patient may point to their own body to express their conditions. Additionally, our agent has multiple interaction modes, that may give more options for the patient to use the agent, not just for medical question answering, but also to engage in conversations about general topics and current events. Both the avatar, and the multiple interaction modes could help improve adherence. We present a high level overview of the design of our agent, Marie Bot Wellbeing. We also report implementation details of our early prototype , and present preliminary results.

* 9 pages 

  Access Paper or Ask Questions

3D Visual Tracking Framework with Deep Learning for Asteroid Exploration

Nov 21, 2021
Dong Zhou, Gunaghui Sun, Xiaopeng Hong

3D visual tracking is significant to deep space exploration programs, which can guarantee spacecraft to flexibly approach the target. In this paper, we focus on the studied accurate and real-time method for 3D tracking. Considering the fact that there are almost no public dataset for this topic, A new large-scale 3D asteroid tracking dataset is presented, including binocular video sequences, depth maps, and point clouds of diverse asteroids with various shapes and textures. Benefitting from the power and convenience of simulation platform, all the 2D and 3D annotations are automatically generated. Meanwhile, we propose a deep-learning based 3D tracking framework, named as Track3D, which involves 2D monocular tracker and a novel light-weight amodal axis-aligned bounding-box network, A3BoxNet. The evaluation results demonstrate that Track3D achieves state-of-the-art 3D tracking performance in both accuracy and precision, comparing to a baseline algorithm. Moreover, our framework has great generalization ability to 2D monocular tracking performance.

  Access Paper or Ask Questions

Controlled Neural Sentence-Level Reframing of News Articles

Sep 10, 2021
Wei-Fan Chen, Khalid Al-Khatib, Benno Stein, Henning Wachsmuth

Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become necessary to achieve the desired effect on the readers. Reframing is related to adapting style and sentiment, which can be tackled with neural text generation techniques. However, it is more challenging since changing a frame requires rewriting entire sentences rather than single phrases. In this paper, we study how to computationally reframe sentences in news articles while maintaining their coherence to the context. We treat reframing as a sentence-level fill-in-the-blank task for which we train neural models on an existing media frame corpus. To guide the training, we propose three strategies: framed-language pretraining, named-entity preservation, and adversarial learning. We evaluate respective models automatically and manually for topic consistency, coherence, and successful reframing. Our results indicate that generating properly-framed text works well but with tradeoffs.

* EMNLP 2021 Findings 

  Access Paper or Ask Questions

Bilateral Trade: A Regret Minimization Perspective

Sep 08, 2021
Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi

Bilateral trade, a fundamental topic in economics, models the problem of intermediating between two strategic agents, a seller and a buyer, willing to trade a good for which they hold private valuations. In this paper, we cast the bilateral trade problem in a regret minimization framework over $T$ rounds of seller/buyer interactions, with no prior knowledge on their private valuations. Our main contribution is a complete characterization of the regret regimes for fixed-price mechanisms with different feedback models and private valuations, using as a benchmark the best fixed-price in hindsight. More precisely, we prove the following tight bounds on the regret: - $\Theta(\sqrt{T})$ for full-feedback (i.e., direct revelation mechanisms). - $\Theta(T^{2/3})$ for realistic feedback (i.e., posted-price mechanisms) and independent seller/buyer valuations with bounded densities. - $\Theta(T)$ for realistic feedback and seller/buyer valuations with bounded densities. - $\Theta(T)$ for realistic feedback and independent seller/buyer valuations. - $\Theta(T)$ for the adversarial setting.

* arXiv admin note: substantial text overlap with arXiv:2102.08754 

  Access Paper or Ask Questions

Node Feature Kernels Increase Graph Convolutional Network Robustness

Sep 04, 2021
Mohamed El Amine Seddik, Changmin Wu, Johannes F. Lutzeyer, Michalis Vazirgiannis

The robustness of the much-used Graph Convolutional Networks (GCNs) to perturbations of their input is becoming a topic of increasing importance. In this paper, the random GCN is introduced for which a random matrix theory analysis is possible. This analysis suggests that if the graph is sufficiently perturbed, or in the extreme case random, then the GCN fails to benefit from the node features. It is furthermore observed that enhancing the message passing step in GCNs by adding the node feature kernel to the adjacency matrix of the graph structure solves this problem. An empirical study of a GCN utilised for node classification on six real datasets further confirms the theoretical findings and demonstrates that perturbations of the graph structure can result in GCNs performing significantly worse than Multi-Layer Perceptrons run on the node features alone. In practice, adding a node feature kernel to the message passing of perturbed graphs results in a significant improvement of the GCN's performance, thereby rendering it more robust to graph perturbations. Our code is publicly available at:

* 16 pages, 5 figures 

  Access Paper or Ask Questions

MUSIQ: Multi-scale Image Quality Transformer

Aug 12, 2021
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, Feng Yang

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.

* ICCV 2021 

  Access Paper or Ask Questions