Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

May 31, 2023
Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia

Figure 1 for Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

Figure 2 for Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

Figure 3 for Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

Figure 4 for Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

This paper presents a novel transformer-based framework for unsupervised activity segmentation which leverages not only frame-level cues but also segment-level cues. This is in contrast with previous methods which often rely on frame-level information only. Our approach begins with a frame-level prediction module which estimates framewise action classes via a transformer encoder. The frame-level prediction module is trained in an unsupervised manner via temporal optimal transport. To exploit segment-level information, we introduce a segment-level prediction module and a frame-to-segment alignment module. The former includes a transformer decoder for estimating video transcripts, while the latter matches frame-level features with segment-level features, yielding permutation-aware segmentation results. Moreover, inspired by temporal optimal transport, we develop simple-yet-effective pseudo labels for unsupervised training of the above modules. Our experiments on four public datasets, i.e., 50 Salads, YouTube Instructions, Breakfast, and Desktop Assembly show that our approach achieves comparable or better performance than previous methods in unsupervised activity segmentation.

Via

Access Paper or Ask Questions

A rule-general abductive learning by rough sets

May 31, 2023
Xu-chang Guo, Hou-biao Li

Figure 1 for A rule-general abductive learning by rough sets

Figure 2 for A rule-general abductive learning by rough sets

Figure 3 for A rule-general abductive learning by rough sets

Figure 4 for A rule-general abductive learning by rough sets

In real-world tasks, there is usually a large amount of unlabeled data and labeled data. The task of combining the two to learn is known as semi-supervised learning. Experts can use logical rules to label unlabeled data, but this operation is costly. The combination of perception and reasoning has a good effect in processing such semi-supervised tasks with domain knowledge. However, acquiring domain knowledge and the correction, reduction and generation of rules remain complex problems to be solved. Rough set theory is an important method for solving knowledge processing in information systems. In this paper, we propose a rule general abductive learning by rough set (RS-ABL). By transforming the target concept and sub-concepts of rules into information tables, rough set theory is used to solve the acquisition of domain knowledge and the correction, reduction and generation of rules at a lower cost. This framework can also generate more extensive negative rules to enhance the breadth of the knowledge base. Compared with the traditional semi-supervised learning method, RS-ABL has higher accuracy in dealing with semi-supervised tasks.

Via

Access Paper or Ask Questions

Super-Resolution Radar Imaging with Sparse Arrays Using a Deep Neural Network Trained with Enhanced Virtual Data

Jun 16, 2023
Christian Schuessler, Marcel Hoffmann, Martin Vossiek

Figure 1 for Super-Resolution Radar Imaging with Sparse Arrays Using a Deep Neural Network Trained with Enhanced Virtual Data

Figure 2 for Super-Resolution Radar Imaging with Sparse Arrays Using a Deep Neural Network Trained with Enhanced Virtual Data

Figure 3 for Super-Resolution Radar Imaging with Sparse Arrays Using a Deep Neural Network Trained with Enhanced Virtual Data

Figure 4 for Super-Resolution Radar Imaging with Sparse Arrays Using a Deep Neural Network Trained with Enhanced Virtual Data

This paper introduces a method based on a deep neural network (DNN) that is perfectly capable of processing radar data from extremely thinned radar apertures. The proposed DNN processing can provide both aliasing-free radar imaging and super-resolution. The results are validated by measuring the detection performance on realistic simulation data and by evaluating the Point-Spread-function (PSF) and the target-separation performance on measured point-like targets. Also, a qualitative evaluation of a typical automotive scene is conducted. It is shown that this approach can outperform state-of-the-art subspace algorithms and also other existing machine learning solutions. The presented results suggest that machine learning approaches trained with sufficiently sophisticated virtual input data are a very promising alternative to compressed sensing and subspace approaches in radar signal processing. The key to this performance is that the DNN is trained using realistic simulation data that perfectly mimic a given sparse antenna radar array hardware as the input. As ground truth, ultra-high resolution data from an enhanced virtual radar are simulated. Contrary to other work, the DNN utilizes the complete radar cube and not only the antenna channel information at certain range-Doppler detections. After training, the proposed DNN is capable of sidelobe- and ambiguity-free imaging. It simultaneously delivers nearly the same resolution and image quality as would be achieved with a fully occupied array.

* 15 pages, 12 figures, Accepted to IEEE Journal of Microwaves

Via

Access Paper or Ask Questions

Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Jun 16, 2023
Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, Anthony Bagnall, César Hervás-Martínez

Figure 1 for Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Figure 2 for Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Figure 3 for Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Figure 4 for Convolutional and Deep Learning based techniques for Time Series Ordinal Classification

Time Series Classification (TSC) covers the supervised learning problem where input data is provided in the form of series of values observed through repeated measurements over time, and whose objective is to predict the category to which they belong. When the class values are ordinal, classifiers that take this into account can perform better than nominal classifiers. Time Series Ordinal Classification (TSOC) is the field covering this gap, yet unexplored in the literature. There are a wide range of time series problems showing an ordered label structure, and TSC techniques that ignore the order relationship discard useful information. Hence, this paper presents a first benchmarking of TSOC methodologies, exploiting the ordering of the target labels to boost the performance of current TSC state-of-the-art. Both convolutional- and deep learning-based methodologies (among the best performing alternatives for nominal TSC) are adapted for TSOC. For the experiments, a selection of 18 ordinal problems from two well-known archives has been made. In this way, this paper contributes to the establishment of the state-of-the-art in TSOC. The results obtained by ordinal versions are found to be significantly better than current nominal TSC techniques in terms of ordinal performance metrics, outlining the importance of considering the ordering of the labels when dealing with this kind of problems.

* 13 pages, 9 figures, 3 tables

Via

Access Paper or Ask Questions

Tactile-Reactive Roller Grasper

Jun 16, 2023
Shenli Yuan, Shaoxiong Wang, Radhen Patel, Megha Tippur, Connor Yako, Edward Adelson, Kenneth Salisbury

Figure 1 for Tactile-Reactive Roller Grasper

Figure 2 for Tactile-Reactive Roller Grasper

Figure 3 for Tactile-Reactive Roller Grasper

Figure 4 for Tactile-Reactive Roller Grasper

Manipulation of objects within a robot's hand is one of the most important challenges in achieving robot dexterity. The "Roller Graspers" refers to a family of non-anthropomorphic hands utilizing motorized, rolling fingertips to achieve in-hand manipulation. These graspers manipulate grasped objects by commanding the rollers to exert forces that propel the object in the desired motion directions. In this paper, we explore the possibility of robot in-hand manipulation through tactile-guided rolling. We do so by developing the Tactile-Reactive Roller Grasper (TRRG), which incorporates camera-based tactile sensing with compliant, steerable cylindrical fingertips, with accompanying sensor information processing and control strategies. We demonstrated that the combination of tactile feedback and the actively rolling surfaces enables a variety of robust in-hand manipulation applications. In addition, we also demonstrated object reconstruction techniques using tactile-guided rolling. A controlled experiment was conducted to provide insights on the benefits of tactile-reactive rollers for manipulation. We considered two manipulation cases: when the fingers are manipulating purely through rolling and when they are periodically breaking and reestablishing contact as in regrasping. We found that tactile-guided rolling can improve the manipulation robustness by allowing the grasper to perform necessary fine grip adjustments in both manipulation cases, indicating that hybrid rolling fingertip and finger-gaiting designs may be a promising research direction.

Via

Access Paper or Ask Questions

Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

May 29, 2023
Guan-Wei Wu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee

Figure 1 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Figure 2 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Figure 3 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Figure 4 for Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target

Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. Previous research has made progress in end-to-end SLU by using paired speech-text data, such as pre-trained Automatic Speech Recognition (ASR) models or paired text as intermediate targets. However, acquiring paired transcripts is expensive and impractical for unwritten languages. On the other hand, Textless SLU extracts semantic information from speech without utilizing paired transcripts. However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance. In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance. Our method surpasses the baseline method on five SLU benchmark corpora. Additionally, we find that unit guidance facilitates few-shot learning and enhances the model's ability to handle noise.

* Accepted by interspeech 2023

Via

Access Paper or Ask Questions

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Jun 09, 2023
Yida Chen, Fernanda Viégas, Martin Wattenberg

Figure 1 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Figure 2 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Figure 3 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Figure 4 for Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions

Fast and Effective GNN Training with Linearized Random Spanning Trees

Jun 09, 2023
Francesco Bonchi, Claudio Gentile, André Panisson, Fabio Vitale

Figure 1 for Fast and Effective GNN Training with Linearized Random Spanning Trees

Figure 2 for Fast and Effective GNN Training with Linearized Random Spanning Trees

Figure 3 for Fast and Effective GNN Training with Linearized Random Spanning Trees

Figure 4 for Fast and Effective GNN Training with Linearized Random Spanning Trees

We present a new effective and scalable framework for training GNNs in supervised node classification tasks, given graph-structured data. Our approach increasingly refines the weight update operations on a sequence of path graphs obtained by linearizing random spanning trees extracted from the input network. The path graphs are designed to retain essential topological and node information of the original graph. At the same time, the sparsity of path graphs enables a much lighter GNN training which, besides scalability, helps in mitigating classical training issues, like over-squashing and over-smoothing. We carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we apply our framework to graph convolutional networks, showing simultaneous improvement of both training speed and test accuracy, as compared to well-known baselines.

Via

Access Paper or Ask Questions

SR-OOD: Out-of-Distribution Detection via Sample Repairing

May 26, 2023
Rui Sun, Andi Zhang, Haiming Zhang, Yao Zhu, Ruimao Zhang, Zhen Li

Figure 1 for SR-OOD: Out-of-Distribution Detection via Sample Repairing

Figure 2 for SR-OOD: Out-of-Distribution Detection via Sample Repairing

Figure 3 for SR-OOD: Out-of-Distribution Detection via Sample Repairing

Figure 4 for SR-OOD: Out-of-Distribution Detection via Sample Repairing

It is widely reported that deep generative models can classify out-of-distribution (OOD) samples as in-distribution with high confidence. In this work, we propose a hypothesis that this phenomenon is due to the reconstruction task, which can cause the generative model to focus too much on low-level features and not enough on semantic information. To address this issue, we introduce SR-OOD, an OOD detection framework that utilizes sample repairing to encourage the generative model to learn more than just an identity map. By focusing on semantics, our framework improves OOD detection performance without external data and label information. Our experimental results demonstrate the competitiveness of our approach in detecting OOD samples.

Via

Access Paper or Ask Questions

Estimation of control area in badminton doubles with pose information from top and back view drone videos

May 07, 2023
Ning Ding, Kazuya Takeda, Wenhui Jin, Yingjiu Bei, Keisuke Fujii

Figure 1 for Estimation of control area in badminton doubles with pose information from top and back view drone videos

Figure 2 for Estimation of control area in badminton doubles with pose information from top and back view drone videos

Figure 3 for Estimation of control area in badminton doubles with pose information from top and back view drone videos

Figure 4 for Estimation of control area in badminton doubles with pose information from top and back view drone videos

The application of visual tracking to the performance analysis of sports players in dynamic competitions is vital for effective coaching. In racket sports, most previous studies have focused on analyzing and assessing singles players without occlusion in broadcast videos and discrete representations (e.g., stroke) that ignore meaningful spatial distributions. In this work, we present the first annotated drone dataset from top and back views in badminton doubles and propose a framework to estimate the control area probability map, which can be used to evaluate teamwork performance. We present an efficient framework of deep neural networks that enables the calculation of full probability surfaces, which utilizes the embedding of a Gaussian mixture map of players' positions and graph convolution of their poses. In the experiment, we verify our approach by comparing various baselines and discovering the correlations between the score and control area. Furthermore, we propose the practical application of assessing optimal positioning to provide instructions during a game. Our approach can visually and quantitatively evaluate players' movements, providing valuable insights into doubles teamwork.

Via

Access Paper or Ask Questions