Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

Aug 18, 2022
Alessandra Tafuro, Bappaditya Debnath, Andrea M. Zanchettin, Amir Ghalamzan E

Figure 1 for dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

Figure 2 for dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

Figure 3 for dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

Figure 4 for dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

This paper presents a novel probabilistic approach to deep robot learning from demonstrations (LfD). Deep movement primitives (DMPs) are deterministic LfD model that maps visual information directly into a robot trajectory. This paper extends DMPs and presents a deep probabilistic model that maps the visual information into a distribution of effective robot trajectories. The architecture that leads to the highest level of trajectory accuracy is presented and compared with the existing methods. Moreover, this paper introduces a novel training method for learning domain-specific latent features. We show the superiority of the proposed probabilistic approach and novel latent space learning in the lab's real-robot task of strawberry harvesting. The experimental results demonstrate that latent space learning can significantly improve model prediction performances. The proposed approach allows to sample trajectories from distribution and optimises the robot trajectory to meet a secondary objective, e.g. collision avoidance.

* To appear In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

Via

Access Paper or Ask Questions

ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals

Sep 26, 2022
Shihong Fang, Haoran Zhu, Devansh Bisla, Anna Choromanska, Satish Ravindran, Dongyin Ren, Ryan Wu

Figure 1 for ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals

Figure 2 for ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals

Figure 3 for ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals

Figure 4 for ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals

Among various sensors for assisted and autonomous driving systems, automotive radar has been considered as a robust and low-cost solution even in adverse weather or lighting conditions. With the recent development of radar technologies and open-sourced annotated data sets, semantic segmentation with radar signals has become very promising. However, existing methods are either computationally expensive or discard significant amounts of valuable information from raw 3D radar signals by reducing them to 2D planes via averaging. In this work, we introduce ERASE-Net, an Efficient RAdar SEgmentation Network to segment the raw radar signals semantically. The core of our approach is the novel detect-then-segment method for raw radar signals. It first detects the center point of each object, then extracts a compact radar signal representation, and finally performs semantic segmentation. We show that our method can achieve superior performance on radar semantic segmentation task compared to the state-of-the-art (SOTA) technique. Furthermore, our approach requires up to 20x less computational resources. Finally, we show that the proposed ERASE-Net can be compressed by 40% without significant loss in performance, significantly more than the SOTA network, which makes it a more promising candidate for practical automotive applications.

Via

Access Paper or Ask Questions

It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying

Sep 26, 2022
Eley Ng, Ziang Liu, Monroe Kennedy III

Figure 1 for It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying

Figure 2 for It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying

Figure 3 for It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying

Figure 4 for It Takes Two: Learning to Plan for Human-Robot Cooperative Carrying

Collaborative table-carrying is a complex task due to the continuous nature of the action and state-spaces, multimodality of strategies, existence of obstacles in the environment, and the need for instantaneous adaptation to other agents. In this work, we present a method for predicting realistic motion plans for cooperative human-robot teams on a table-carrying task. Using a Variational Recurrent Neural Network, VRNN, to model the variation in the trajectory of a human-robot team over time, we are able to capture the distribution over the team's future states while leveraging information from interaction history. The key to our approach is in our model's ability to leverage human demonstration data and generate trajectories that synergize well with humans during test time. We show that the model generates more human-like motion compared to a baseline, centralized sampling-based planner, Rapidly-exploring Random Trees (RRT). Furthermore, we evaluate the VRNN planner with a human partner and show its ability to both generate more human-like paths and achieve higher task success rate than RRT can while planning with a human. Finally, we demonstrate that a LoCoBot using the VRNN planner can complete the task successfully with a human controlling another LoCoBot.

* 6 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

Dynamic Hybrid Beamforming Design for Dual-Function Radar-Communication Systems

Sep 11, 2022
Bowen Wang, Hongyu Li, Ziyang Cheng

Figure 1 for Dynamic Hybrid Beamforming Design for Dual-Function Radar-Communication Systems

Figure 2 for Dynamic Hybrid Beamforming Design for Dual-Function Radar-Communication Systems

This paper investigates dynamic hybrid beamforming (HBF) for a dual-function radar-communication (DFRC) system, where the DFRC base station (BS) simultaneously serves multiple single-antenna users and senses a target in the presence of multiple clutters. Particularly, we apply a HBF architecture with dynamic subarrays and double phase shifters in the DFRC BS. Aiming at maximizing the radar mutual information, we consider jointly designing the dynamic HBF of the DFRC system, subject to the constraints of communication quality of service (QoS), transmit power, and analog beamformer. To solve the complicated non-convex optimization, an efficient alternating optimization algorithm based on the majorization-minimization methods is developed. Simulation results verify the advancement of the considered HBF architecture and the effectiveness of the proposed design method.

Via

Access Paper or Ask Questions

Towards Frame Rate Agnostic Multi-Object Tracking

Oct 07, 2022
Weitao Feng, Lei Bai, Yongqiang Yao, Fengwei Yu, Wanli Ouyang

Figure 1 for Towards Frame Rate Agnostic Multi-Object Tracking

Figure 2 for Towards Frame Rate Agnostic Multi-Object Tracking

Figure 3 for Towards Frame Rate Agnostic Multi-Object Tracking

Figure 4 for Towards Frame Rate Agnostic Multi-Object Tracking

Multi-Object Tracking (MOT) is one of the most fundamental computer vision tasks which contributes to a variety of video analysis applications. Despite the recent promising progress, current MOT research is still limited to a fixed sampling frame rate of the input stream. In fact, we empirically find that the accuracy of all recent state-of-the-art trackers drops dramatically when the input frame rate changes. For a more intelligent tracking solution, we shift the attention of our research work to the problem of Frame Rate Agnostic MOT (FraMOT). In this paper, we propose a Frame Rate Agnostic MOT framework with Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information to aid identity matching across multi-frame-rate inputs, improving the capability of the learned model in handling complex motion-appearance relations in FraMOT. Besides, the association gap between training and inference is enlarged in FraMOT because those post-processing steps not included in training make a larger difference in lower frame rate scenarios. To address it, we propose Periodic Training Scheme (PTS) to reflect all post-processing steps in training via tracking pattern matching and fusion. Along with the proposed approaches, we make the first attempt to establish an evaluation method for this new task of FraMOT in two different modes, i.e., known frame rate and unknown frame rate, aiming to handle a more complex situation. The quantitative experiments on the challenging MOT datasets (FraMOT version) have clearly demonstrated that the proposed approaches can handle different frame rates better and thus improve the robustness against complicated scenarios.

* 21 pages; Author version

Via

Access Paper or Ask Questions

Multimodal Across Domains Gaze Target Detection

Aug 23, 2022
Francesco Tonini, Cigdem Beyan, Elisa Ricci

Figure 1 for Multimodal Across Domains Gaze Target Detection

Figure 2 for Multimodal Across Domains Gaze Target Detection

Figure 3 for Multimodal Across Domains Gaze Target Detection

Figure 4 for Multimodal Across Domains Gaze Target Detection

This paper addresses the gaze target detection problem in single images captured from the third-person perspective. We present a multimodal deep architecture to infer where a person in a scene is looking. This spatial model is trained on the head images of the person-of- interest, scene and depth maps representing rich context information. Our model, unlike several prior art, do not require supervision of the gaze angles, do not rely on head orientation information and/or location of the eyes of person-of-interest. Extensive experiments demonstrate the stronger performance of our method on multiple benchmark datasets. We also investigated several variations of our method by altering joint-learning of multimodal data. Some variations outperform a few prior art as well. First time in this paper, we inspect domain adaption for gaze target detection, and we empower our multimodal network to effectively handle the domain gap across datasets. The code of the proposed method is available at https://github.com/francescotonini/multimodal-across-domains-gaze-target-detection.

* Accepted to 24th ACM International Conference on Multimodal Interaction (ICMI 2022)

Via

Access Paper or Ask Questions

Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Sep 26, 2022
Jingyang Lin, Yu Wang, Qi Cai, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei

Figure 1 for Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Figure 2 for Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Figure 3 for Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Figure 4 for Out-of-Distribution Detection with Hilbert-Schmidt Independence Optimization

Outlier detection tasks have been playing a critical role in AI safety. There has been a great challenge to deal with this task. Observations show that deep neural network classifiers usually tend to incorrectly classify out-of-distribution (OOD) inputs into in-distribution classes with high confidence. Existing works attempt to solve the problem by explicitly imposing uncertainty on classifiers when OOD inputs are exposed to the classifier during training. In this paper, we propose an alternative probabilistic paradigm that is both practically useful and theoretically viable for the OOD detection tasks. Particularly, we impose statistical independence between inlier and outlier data during training, in order to ensure that inlier data reveals little information about OOD data to the deep estimator during training. Specifically, we estimate the statistical dependence between inlier and outlier data through the Hilbert-Schmidt Independence Criterion (HSIC), and we penalize such metric during training. We also associate our approach with a novel statistical test during the inference time coupled with our principled motivation. Empirical results show that our method is effective and robust for OOD detection on various benchmarks. In comparison to SOTA models, our approach achieves significant improvement regarding FPR95, AUROC, and AUPR metrics. Code is available: \url{https://github.com/jylins/hood}.

* Source code is available at \url{https://github.com/jylins/hood}

Via

Access Paper or Ask Questions

LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Sep 21, 2022
Isack Lee, Jun-Seok Yun, Hee Hyeon Kim, Youngju Na, Seok Bong Yoo

Figure 1 for LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Figure 2 for LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Figure 3 for LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Figure 4 for LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic Latent Code Manipulation

Although recent gaze estimation methods lay great emphasis on attentively extracting gaze-relevant features from facial or eye images, how to define features that include gaze-relevant components has been ambiguous. This obscurity makes the model learn not only gaze-relevant features but also irrelevant ones. In particular, it is fatal for the cross-dataset performance. To overcome this challenging issue, we propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics, to selectively utilize gaze-relevant features in a latent code. Furthermore, by utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware. In addition, we propose gaze distortion loss in the encoder that prevents the distortion of gaze information. The experimental results demonstrate that our method achieves state-of-the-art gaze estimation accuracy in a cross-domain gaze estimation tasks. This code is available at https://github.com/leeisack/LatentGaze/.

Via

Access Paper or Ask Questions

TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

Aug 31, 2022
Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang

Figure 1 for TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

Figure 2 for TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

Figure 3 for TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

Figure 4 for TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

Table structure recognition is a crucial part of document image analysis domain. Its difficulty lies in the need to parse the physical coordinates and logical indices of each cell at the same time. However, the existing methods are difficult to achieve both these goals, especially when the table splitting lines are blurred or tilted. In this paper, we propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST. Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation. By introducing novel Transformer-based Query-based Splitting Module and Vertex-based Merging Module, the table structure recognition problem is decoupled into two joint optimization sub-tasks: multi-oriented table row/column splitting and table grid merging. The Query-based Splitting Module learns strong context information from long dependencies via Transformer networks, accurately predicts the multi-oriented table row/column separators, and obtains the basic grids of the table accordingly. The Vertex-based Merging Module is capable of aggregating local contextual information between adjacent basic grids, providing the ability to merge basic girds that belong to the same spanning cell accurately. We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results. In particular, TRUST runs at 10 FPS on PubTabNet, surpassing the previous methods by a large margin.

Via

Access Paper or Ask Questions

Analyzing social media with crowdsourcing in Crowd4SDG

Aug 04, 2022
Carlo Bono, Mehmet Oğuz Mülâyim, Cinzia Cappiello, Mark Carman, Jesus Cerquides, Jose Luis Fernandez-Marquez, Rosy Mondardini, Edoardo Ramalli, Barbara Pernici

Figure 1 for Analyzing social media with crowdsourcing in Crowd4SDG

Figure 2 for Analyzing social media with crowdsourcing in Crowd4SDG

Figure 3 for Analyzing social media with crowdsourcing in Crowd4SDG

Figure 4 for Analyzing social media with crowdsourcing in Crowd4SDG

Social media have the potential to provide timely information about emergency situations and sudden events. However, finding relevant information among millions of posts being posted every day can be difficult, and developing a data analysis project usually requires time and technical skills. This study presents an approach that provides flexible support for analyzing social media, particularly during emergencies. Different use cases in which social media analysis can be adopted are introduced, and the challenges of retrieving information from large sets of posts are discussed. The focus is on analyzing images and text contained in social media posts and a set of automatic data processing tools for filtering, classification, and geolocation of content with a human-in-the-loop approach to support the data analyst. Such support includes both feedback and suggestions to configure automated tools, and crowdsourcing to gather inputs from citizens. The results are validated by discussing three case studies developed within the Crowd4SDG H2020 European project.

Via

Access Paper or Ask Questions