Person re-identification consists in recognizing an individual that has already been observed over a network of cameras. It is a novel and challenging research topic in computer vision, for which no reference framework exists yet. Despite this, previous works share similar representations of human body based on part decomposition and the implicit concept of multiple instances. Building on these similarities, we propose a Multiple Component Matching (MCM) framework for the person re-identification problem, which is inspired by Multiple Component Learning, a framework recently proposed for object detection. We show that previous techniques for person re-identification can be considered particular implementations of our MCM framework. We then present a novel person re-identification technique as a direct, simple implementation of our framework, focused in particular on robustness to varying lighting conditions, and show that it can attain state of the art performances.
Image segmentation has been a very active research topic in image analysis area. Currently, most of the image segmentation algorithms are designed based on the idea that images are partitioned into a set of regions preserving homogeneous intra-regions and inhomogeneous inter-regions. However, human visual intuition does not always follow this pattern. A new image segmentation method named Visual-Hint Boundary to Segment (VHBS) is introduced, which is more consistent with human perceptions. VHBS abides by two visual hint rules based on human perceptions: (i) the global scale boundaries tend to be the real boundaries of the objects; (ii) two adjacent regions with quite different colors or textures tend to result in the real boundaries between them. It has been demonstrated by experiments that, compared with traditional image segmentation method, VHBS has better performance and also preserves higher computational efficiency.
Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
This paper reviews the recent literature on solving the Boolean satisfiability problem (SAT), an archetypal NP-complete problem, with the help of machine learning techniques. Despite the great success of modern SAT solvers to solve large industrial instances, the design of handcrafted heuristics is time-consuming and empirical. Under the circumstances, the flexible and expressive machine learning methods provide a proper alternative to solve this long-standing problem. We examine the evolving ML-SAT solvers from naive classifiers with handcrafted features to the emerging end-to-end SAT solvers such as NeuroSAT, as well as recent progress on combinations of existing CDCL and local search solvers with machine learning methods. Overall, solving SAT with machine learning is a promising yet challenging research topic. We conclude the limitations of current works and suggest possible future directions.
Fingerprint authentication systems are highly vulnerable to artificial reproductions of fingerprint, called fingerprint presentation attacks. Detecting presentation attacks is not trivial because attackers refine their replication techniques from year to year. The International Fingerprint liveness Detection Competition (LivDet), an open and well-acknowledged meeting point of academies and private companies that deal with the problem of presentation attack detection, has the goal to assess the performance of fingerprint presentation attack detection (FPAD) algorithms by using standard experimental protocols and data sets. Each LivDet edition, held biannually since 2009, is characterized by a different set of challenges against which competitors must be dealt with. The continuous increase of competitors and the noticeable decrease in error rates across competitions demonstrate a growing interest in the topic. This paper reviews the LivDet editions from 2009 to 2021 and points out their evolution over the years.
The Human-Machine Interaction (HMI) research field is an important topic in machine learning that has been deeply investigated thanks to the rise of computing power in the last years. The first time, it is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms. The aim of this paper is to build a symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN) to recognize cultural/anthropological Italian sign language gestures from videos. The CNN extracts important features that later are used by the RNN. With RNNs we are able to store temporal information inside the model to provide contextual information from previous frames to enhance the prediction accuracy. Our novel approach uses different data augmentation techniques and regularization methods from only RGB frames to avoid overfitting and provide a small generalization error.
Incorporating explicit domain knowledge into neural-based task-oriented dialogue systems is an effective way to reduce the need of large sets of annotated dialogues. In this paper, we investigate how the use of explicit domain knowledge of conversational designers affects the performance of neural-based dialogue systems. To support this investigation, we propose the Conversational-Logic-Injection-in-Neural-Network system (CLINN) where explicit knowledge is coded in semi-logical rules. By using CLINN, we evaluated semi-logical rules produced by a team of differently skilled conversational designers. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that external knowledge is extremely important for reducing the need of annotated examples for conversational systems. In fact, rules from conversational designers used in CLINN significantly outperform a state-of-the-art neural-based dialogue system.
Adverse Drug Event (ADE) extraction models can rapidly examine large collections of social media texts, detecting mentions of drug-related adverse reactions and trigger medical investigations. However, despite the recent advances in NLP, it is currently unknown if such models are robust in face of negation, which is pervasive across language varieties. In this paper we evaluate three state-of-the-art systems, showing their fragility against negation, and then we introduce two possible strategies to increase the robustness of these models: a pipeline approach, relying on a specific component for negation detection; an augmentation of an ADE extraction dataset to artificially create negated samples and further train the models. We show that both strategies bring significant increases in performance, lowering the number of spurious entities predicted by the models. Our dataset and code will be publicly released to encourage research on the topic.
With active research in audio compression techniques yielding substantial breakthroughs, spectral reconstruction of low-quality audio waves remains a less indulged topic. In this paper, we propose a novel approach for reconstructing higher frequencies from considerably longer sequences of low-quality MP3 audio waves. Our technique involves inpainting audio spectrograms with residually stacked autoencoder blocks by manipulating individual amplitude and phase values in relation to perceptual differences. Our architecture presents several bottlenecks while preserving the spectral structure of the audio wave via skip-connections. We also compare several task metrics and demonstrate our visual guide to loss selection. Moreover, we show how to leverage differential quantization techniques to reduce the initial model size by more than half while simultaneously reducing inference time, which is crucial in real-world applications.
Soft robotics has been a trending topic within the robotics community for almost two decades. However, the available tools for the community to model and analyze soft robotics artifacts are still limited. This paper presents the development of a user-friendly MATLAB toolbox, SoRoSim, that integrates the Geometric Variable Strain model to facilitate the modeling, analysis, and simulation of hybrid rigid-soft open-chain robotic systems. The toolbox implements a recursive, two-level nested quadrature scheme to solve the model. We demonstrate several examples and applications to validate the toolbox and explore the toolbox's capabilities to efficiently model a vast range of robotic systems, considering different actuators and external loads, including the fluid-structure interactions. We think that the soft-robotics research community will benefit from the SoRoSim toolbox for a wide variety of applications.