Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Topic": models, code, and papers

Improving Deep Learning through Automatic Programming

Jul 08, 2018
The-Hien Dang-Ha

Deep learning and deep architectures are emerging as the best machine learning methods so far in many practical applications such as reducing the dimensionality of data, image classification, speech recognition or object segmentation. In fact, many leading technology companies such as Google, Microsoft or IBM are researching and using deep architectures in their systems to replace other traditional models. Therefore, improving the performance of these models could make a strong impact in the area of machine learning. However, deep learning is a very fast-growing research domain with many core methodologies and paradigms just discovered over the last few years. This thesis will first serve as a short summary of deep learning, which tries to include all of the most important ideas in this research area. Based on this knowledge, we suggested, and conducted some experiments to investigate the possibility of improving the deep learning based on automatic programming (ADATE). Although our experiments did produce good results, there are still many more possibilities that we could not try due to limited time as well as some limitations of the current ADATE version. I hope that this thesis can promote future work on this topic, especially when the next version of ADATE comes out. This thesis also includes a short analysis of the power of ADATE system, which could be useful for other researchers who want to know what it is capable of.

* Master's thesis (2014) 

  Access Paper or Ask Questions

Translation of "Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Orientierung" by Erwin Kruppa (1913)

Dec 25, 2017
Guillermo Gallego, Elias Mueggler, Peter Sturm

Erwin Kruppa's 1913 paper, Erwin Kruppa, "Zur Ermittlung eines Objektes aus zwei Perspektiven mit innerer Orientierung", Sitzungsberichte der Mathematisch-Naturwissenschaftlichen Kaiserlichen Akademie der Wissenschaften, Vol. 122 (1913), pp. 1939-1948, which may be translated as "To determine a 3D object from two perspective views with known inner orientation", is a landmark paper in Computer Vision because it provides the first five-point algorithm for relative pose estimation. Kruppa showed that (a finite number of solutions for) the relative pose between two calibrated images of a rigid object can be computed from five point matches between the images. Kruppa's work also gained attention in the topic of camera self-calibration, as presented in (Maybank and Faugeras, 1992). Since the paper is still relevant today (more than a hundred citations within the last ten years) and the paper is not available online, we ordered a copy from the German National Library in Frankfurt and provide an English translation along with the German original. We also adapt the terminology to a modern jargon and provide some clarifications (highlighted in sans-serif font). For a historical review of geometric computer vision, the reader is referred to the recent survey paper (Sturm, 2011).

* 16 pages, 1 figure. Granted reproduction permission from the publishing house of the Austrian Academy of Sciences (

  Access Paper or Ask Questions

Bipedal locomotion using variable stiffness actuation

Jun 01, 2017
Ludo C. Visser, Stefano Stramigioli, Raffaella Carloni

Robust and energy-efficient bipedal locomotion in robotics is still a challenging topic. In order to address issues in this field, we can take inspiration from nature, by studying human locomotion. The Spring-Loaded Inverted Pendulum (SLIP) model has shown to be a good model for this purpose. However, the human musculoskeletal system enables us to actively modulate leg stiffness, for example when walking in rough terrain with irregular and unexpected height variations of the walking surface. This ability of varying leg stiffness is not considered in conventional SLIP-based models, and therefore this paper explores the potential role of active leg stiffness variation in bipedal locomotion. It is shown that the conceptual SLIP model can be iteratively extended to more closely resemble a realistic (i.e., non-ideal) walker, and that feedback control strategies can be designed that reproduce the SLIP behavior in these extended models. We show that these extended models realize a cost of transport comparable to human walking, which indicates that active leg stiffness variation plays an important role in human locomotion that was previously not captured by the SLIP model. The results of this study show that active leg stiffness adaptation is a promising approach for realizing more energy-efficient and robust bipedal walking robots.

  Access Paper or Ask Questions

Interactive Spoken Content Retrieval by Deep Reinforcement Learning

Sep 16, 2016
Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee

User-machine interaction is important for spoken content retrieval. For text content retrieval, the user can easily scan through and select on a list of retrieved item. This is impossible for spoken content retrieval, because the retrieved items are difficult to show on screen. Besides, due to the high degree of uncertainty for speech recognition, the retrieval results can be very noisy. One way to counter such difficulties is through user-machine interaction. The machine can take different actions to interact with the user to obtain better retrieval results before showing to the user. The suitable actions depend on the retrieval status, for example requesting for extra information from the user, returning a list of topics for user to select, etc. In our previous work, some hand-crafted states estimated from the present retrieval results are used to determine the proper actions. In this paper, we propose to use Deep-Q-Learning techniques instead to determine the machine actions for interactive spoken content retrieval. Deep-Q-Learning bypasses the need for estimation of the hand-crafted states, and directly determine the best action base on the present retrieval status even without any human knowledge. It is shown to achieve significantly better performance compared with the previous hand-crafted states.

* Accepted conference paper: "The Annual Conference of the International Speech Communication Association (Interspeech), 2016" 

  Access Paper or Ask Questions

Compositional Memory for Visual Question Answering

Nov 18, 2015
Aiwen Jiang, Fang Wang, Fatih Porikli, Yi Li

Visual Question Answering (VQA) emerges as one of the most fascinating topics in computer vision recently. Many state of the art methods naively use holistic visual features with language features into a Long Short-Term Memory (LSTM) module, neglecting the sophisticated interaction between them. This coarse modeling also blocks the possibilities of exploring finer-grained local features that contribute to the question answering dynamically over time. This paper addresses this fundamental problem by directly modeling the temporal dynamics between language and all possible local image patches. When traversing the question words sequentially, our end-to-end approach explicitly fuses the features associated to the words and the ones available at multiple local patches in an attention mechanism, and further combines the fused information to generate dynamic messages, which we call episode. We then feed the episodes to a standard question answering module together with the contextual visual information and linguistic information. Motivated by recent practices in deep learning, we use auxiliary loss functions during training to improve the performance. Our experiments on two latest public datasets suggest that our method has a superior performance. Notably, on the DARQUAR dataset we advanced the state of the art by 6$\%$, and we also evaluated our approach on the most recent MSCOCO-VQA dataset.

  Access Paper or Ask Questions

Forming A Random Field via Stochastic Cliques: From Random Graphs to Fully Connected Random Fields

Jun 30, 2015
Mohammad Javad Shafiee, Alexander Wong, Paul Fieguth

Random fields have remained a topic of great interest over past decades for the purpose of structured inference, especially for problems such as image segmentation. The local nodal interactions commonly used in such models often suffer the short-boundary bias problem, which are tackled primarily through the incorporation of long-range nodal interactions. However, the issue of computational tractability becomes a significant issue when incorporating such long-range nodal interactions, particularly when a large number of long-range nodal interactions (e.g., fully-connected random fields) are modeled. In this work, we introduce a generalized random field framework based around the concept of stochastic cliques, which addresses the issue of computational tractability when using fully-connected random fields by stochastically forming a sparse representation of the random field. The proposed framework allows for efficient structured inference using fully-connected random fields without any restrictions on the potential functions that can be utilized. Several realizations of the proposed framework using graph cuts are presented and evaluated, and experimental results demonstrate that the proposed framework can provide competitive performance for the purpose of image segmentation when compared to existing fully-connected and principled deep random field frameworks.

* 8 pages 

  Access Paper or Ask Questions

Architectures and Synchronization Techniques for Distributed Satellite Systems: A Survey

Mar 16, 2022
Liz Martinez Marrero, Juan C. Merlano Duncan, Jorge Querol, Sumit Kumar, Jevgenij Krivochiza, Shree Krishna Sharma, Symeon Chatzinotas, Adriano Camps, Bjorn Otterstern

Cohesive Distributed Satellite Systems (CDSS) is a key enabling technology for the future of remote sensing and communication missions. However, they have to meet strict synchronization requirements before their use is generalized. When clock or local oscillator signals are generated locally at each of the distributed nodes, achieving exact synchronization in absolute phase, frequency, and time is a complex problem. In addition, satellite systems have significant resource constraints, especially for small satellites, which are envisioned to be part of the future CDSS. Thus, the development of precise, robust, and resource-efficient synchronization techniques is essential for the advancement of future CDSS. In this context, this survey aims to summarize and categorize the most relevant results on synchronization techniques for DSS. First, some important architecture and system concepts are defined. Then, the synchronization methods reported in the literature are reviewed and categorized. This article also provides an extensive list of applications and examples of synchronization techniques for DSS in addition to the most significant advances in other operations closely related to synchronization, such as inter-satellite ranging and relative position. The survey also provides a discussion on emerging data-driven synchronization techniques based on ML. Finally, a compilation of current research activities and potential research topics is proposed, identifying problems and open challenges that can be useful for researchers in the field.

* submitted to IEEE Access 

  Access Paper or Ask Questions

Real time spectrogram inversion on mobile phone

Mar 10, 2022
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

With the growth of computing power on mobile phones and privacy concerns over user's data, on-device real time speech processing has become an important research topic. In this paper, we focus on methods for real time spectrogram inversion, where an algorithm receives a portion of the input signal (e.g., one frame) and processes it incrementally, i.e., operating in streaming mode. We present a real time Griffin Lim(GL) algorithm using a sliding window approach in STFT domain. The proposed algorithm is 2.4x faster than real time on the ARM CPU of a Pixel4. In addition we explore a neural vocoder operating in streaming mode and demonstrate the impact of looking ahead on perceptual quality. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to a causal model. We compare GL with the neural vocoder and show different trade-offs in terms of perceptual quality, on-device latency, algorithmic delay, memory footprint and noise sensitivity. For fair quality assessment of the GL approach, we use input log magnitude spectrogram without mel transformation. We evaluate presented real time spectrogram inversion approaches on clean, noisy and atypical speech.

* Submitted to interspeech 2022 

  Access Paper or Ask Questions

Robust Estimation of Discrete Distributions under Local Differential Privacy

Feb 14, 2022
Julien Chhor, Flore Sentenac

Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is an almost unexplored topic. We consider the problem of estimating a discrete distribution in total variation from $n$ contaminated data batches under a local differential privacy constraint. A fraction $1-\epsilon$ of the batches contain $k$ i.i.d. samples drawn from a discrete distribution $p$ over $d$ elements. To protect the users' privacy, each of the samples is privatized using an $\alpha$-locally differentially private mechanism. The remaining $\epsilon n $ batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be $\epsilon/\sqrt{k}+\sqrt{d/kn}$, up to a $\sqrt{\log(1/\epsilon)}$ factor. Under the privacy constraint alone, the minimax rate of estimation is $\sqrt{d^2/\alpha^2 kn}$. We show that combining the two constraints leads to a minimax estimation rate of $\epsilon\sqrt{d/\alpha^2 k}+\sqrt{d^2/\alpha^2 kn}$ up to a $\sqrt{\log(1/\epsilon)}$ factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.

  Access Paper or Ask Questions

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

Feb 14, 2022
Patrick Schramowski, Christopher Tauchmann, Kristian Kersting

Large datasets underlying much of current machine learning raise serious issues concerning inappropriate content such as offensive, insulting, threatening, or might otherwise cause anxiety. This calls for increased dataset documentation, e.g., using datasheets. They, among other topics, encourage to reflect on the composition of the datasets. So far, this documentation, however, is done manually and therefore can be tedious and error-prone, especially for large image datasets. Here we ask the arguably "circular" question of whether a machine can help us reflect on inappropriate content, answering Question 16 in Datasheets. To this end, we propose to use the information stored in pre-trained transformer models to assist us in the documentation process. Specifically, prompt-tuning based on a dataset of socio-moral values steers CLIP to identify potentially inappropriate content, therefore reducing human labor. We then document the inappropriate images found using word clouds, based on captions generated using a vision-language model. The documentations of two popular, large-scale computer vision datasets -- ImageNet and OpenImages -- produced this way suggest that machines can indeed help dataset creators to answer Question 16 on inappropriate image content.

* arXiv admin note: text overlap with arXiv:2110.04222 

  Access Paper or Ask Questions