Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks.
Deploying deep learning models in time-critical applications with limited computational resources, for instance in edge computing systems and IoT networks, is a challenging task that often relies on dynamic inference methods such as early exiting. In this paper, we introduce a novel architecture for early exiting based on the vision transformer architecture, as well as a fine-tuning strategy that significantly increase the accuracy of early exit branches compared to conventional approaches while introducing less overhead. Through extensive experiments on image and audio classification as well as audiovisual crowd counting, we show that our method works for both classification and regression problems, and in both single- and multi-modal settings. Additionally, we introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis, that can lead to a more fine-grained dynamic inference.
Robotic visual systems operating in the wild must act in unconstrained scenarios, under different environmental conditions while facing a variety of semantic concepts, including unknown ones. To this end, recent works tried to empower visual object recognition methods with the capability to i) detect unseen concepts and ii) extended their knowledge over time, as images of new semantic classes arrive. This setting, called Open World Recognition (OWR), has the goal to produce systems capable of breaking the semantic limits present in the initial training set. However, this training set imposes to the system not only its own semantic limits, but also environmental ones, due to its bias toward certain acquisition conditions that do not necessarily reflect the high variability of the real-world. This discrepancy between training and test distribution is called domain-shift. This work investigates whether OWR algorithms are effective under domain-shift, presenting the first benchmark setup for assessing fairly the performances of OWR algorithms, with and without domain-shift. We then use this benchmark to conduct analyses in various scenarios, showing how existing OWR algorithms indeed suffer a severe performance degradation when train and test distributions differ. Our analysis shows that this degradation is only slightly mitigated by coupling OWR with domain generalization techniques, indicating that the mere plug-and-play of existing algorithms is not enough to recognize new and unknown categories in unseen domains. Our results clearly point toward open issues and future research directions, that need to be investigated for building robot visual systems able to function reliably under these challenging yet very real conditions. Code available at https://github.com/DarioFontanel/OWR-VisualDomains
There is much confusion in the literature over Hurst exponent (H). The purpose of this paper is to illustrate the difference between fractional Brownian motion (fBm) on the one hand and Gaussian Markov processes where H is different to 1/2 on the other. The difference lies in the increments, which are stationary and correlated in one case and nonstationary and uncorrelated in the other. The two- and one-point densities of fBm are constructed explicitly. The two-point density does not scale. The one-point density for a semi-infinite time interval is identical to that for a scaling Gaussian Markov process with H different to 1/2 over a finite time interval. We conclude that both Hurst exponents and one-point densities are inadequate for deducing the underlying dynamics from empirical data. We apply these conclusions in the end to make a focused statement about nonlinear diffusion.
Motivated by the consideration of fairly sharing the cost of exploration between multiple groups in learning problems, we develop the Nash bargaining solution in the context of multi-armed bandits. Specifically, the 'grouped' bandit associated with any multi-armed bandit problem associates, with each time step, a single group from some finite set of groups. The utility gained by a given group under some learning policy is naturally viewed as the reduction in that group's regret relative to the regret that group would have incurred 'on its own'. We derive policies that yield the Nash bargaining solution relative to the set of incremental utilities possible under any policy. We show that on the one hand, the 'price of fairness' under such policies is limited, while on the other hand, regret optimal policies are arbitrarily unfair under generic conditions. Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.
An edge stream is a common form of presentation of dynamic networks. It can evolve with time, with new types of nodes or edges being continuously added. Existing methods for anomaly detection rely on edge occurrence counts or compare pattern snippets found in historical records. In this work, we propose Isconna, which focuses on both the frequency and the pattern of edge records. The burst detection component targets anomalies between individual timestamps, while the pattern detection component highlights anomalies across segments of timestamps. These two components together produce three intermediate scores, which are aggregated into the final anomaly score. Isconna does not actively explore or maintain pattern snippets; it instead measures the consecutive presence and absence of edge records. Isconna is an online algorithm, it does not keep the original information of edge records; only statistical values are maintained in a few count-min sketches (CMS). Isconna's space complexity $O(rc)$ is determined by two user-specific parameters, the size of CMSs. In worst case, Isconna's time complexity can be up to $O(rc)$, but it can be amortized in practice. Experiments show that Isconna outperforms five state-of-the-art frequency- and/or pattern-based baselines on six real-world datasets with up to 20 million edge records.
This paper presents a closed-form approach to constrain a flow within a given volume and around objects. The flow is guaranteed to converge and to stop at a single fixed point. We show that the obstacle avoidance problem can be inverted to enforce that the flow remains enclosed within a volume defined by a polygonal surface. We formally guarantee that such a flow will never contact the boundaries of the enclosing volume and obstacles, and will asymptotically converge towards an attractor. We further create smooth motion fields around obstacles with edges (e.g. tables). Both obstacles and enclosures may be time-varying, i.e. moving, expanding and shrinking. The technique enables a robot to navigate within an enclosed corridor while avoiding static and moving obstacles. It was applied on an autonomous robot (QOLO) in a static complex indoor environment, and also tested in simulations with dense crowds. The final proof of concept was performed in an outdoor environment in Lausanne. The QOLO-robot successfully traversed a marketplace in the center of town in presence of a diverse crowd with a non-uniform motion pattern.
Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While sparse point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work incorporates a real-time deep-learned object detector to the monocular SLAM framework for representing generic objects as quadrics that permit detections to be seamlessly integrated while allowing the real-time performance. Finer reconstruction of an object, learned by a CNN network, is also incorporated and provides a shape prior for the quadric leading further refinement. To capture the dominant structure of the scene, additional planar landmarks are detected by a CNN-based plane detector and modelled as landmarks in the map. Experiments show that the introduced plane and object landmarks and the associated constraints, using the proposed monocular plane detector and incorporated object detector, significantly improve camera localization and lead to a richer semantically more meaningful map. The performance of our SLAM system is demonstrated in https://youtu.be/UMWXd4sHONw .
Robotic assembly planning has the potential to profoundly change how buildings can be designed and created. It enables architects to explicitly account for the assembly process already during the design phase, and enables efficient building methods that profit from the robots' different capabilities. Previous work has addressed planning of robot assembly sequences and identifying the feasibility of architectural designs. This paper extends previous work by enabling assembly planning with large, heterogeneous teams of robots. We present a scalable planning system which enables parallelization of complex task and motion planning problems by iteratively solving smaller sub-problems. Combining optimization methods to solve for manipulation constraints with a sampling-based bi-directional space-time path planner enables us to plan cooperative multi-robot manipulation with unknown arrival-times. Thus, our solver allows for completing sub-problems and tasks with differing timescales and synchronizes them effectively. We demonstrate the approach on multiple case-studies and on two long-horizon building assembly scenarios to show the robustness and scalability of our algorithm.
The detection of traffic anomalies is a critical component of the intelligent city transportation management system. Previous works have proposed a variety of notable insights and taken a step forward in this field, however, dealing with the complex traffic environment remains a challenge. Moreover, the lack of high-quality data and the complexity of the traffic scene, motivate us to study this problem from a hand-crafted perspective. In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing. With video stabilization, background modeling, and vehicle detection, the pro-processing phase aims to generate candidate anomalies. The dynamic tracking module seeks and locates the start time of anomalies by utilizing vehicle motion patterns and spatiotemporal status. Finally, we use post-processing to fine-tune the temporal boundary of anomalies. Not surprisingly, our proposed framework was ranked $1^{st}$ in the NVIDIA AI CITY 2021 leaderboard for traffic anomaly detection. The code is available at: https://github.com/Endeavour10020/AICity2021-Anomaly-Detection .