Previous video modeling methods leverage the cubic 3D convolution filters or its decomposed variants to exploit the motion cues for precise action recognition, which tend to be performed on the video features along the temporal and spatial axes symmetrically. This brings the hypothesis implicitly that the actions are recognized from the cubic voxel level and neglects the essential spatial-temporal shape diversity across different actions. In this paper, we propose a novel video representing method that fuses the features spatially and temporally in an asymmetric way to model action atomics spanning multi-scale spatial-temporal scales. To permit the feature fusion procedure efficiently and effectively, we also design the optimized feature interaction layer, which covers most feature fusion techniques as special case of it, e.g., channel shuffling and channel concatenating. We instantiate our method as a \textit{plug-and-play} block, termed Multi-Scale Efficient Asymmetric Spatial-Temporal Block. Our method can easily adapt the traditional 2D CNNs to the video understanding tasks such as action recognition. We verify our method on several most recent large-scale video datasets requiring strong temporal reasoning or appearance discriminating, e.g., Something-to-Something v1, Kinetics and Diving48, demonstrate the new state-of-the-art results without bells and whistles.
In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search. The key idea is to formulate the GAN architecture search problem as a Markov decision process (MDP) for smoother architecture sampling, which enables a more effective RL-based search algorithm by targeting the potential global optimal architecture. To improve efficiency, we exploit an off-policy GAN architecture search algorithm that makes efficient use of the samples generated by previous policies. Evaluation on two standard benchmark datasets (i.e., CIFAR-10 and STL-10) demonstrates that the proposed method is able to discover highly competitive architectures for generally better image generation results with a considerably reduced computational burden: 7 GPU hours. Our code is available at https://github.com/Yuantian013/E2GAN.
Machine learning systems that rely on training data collected from untrusted sources are vulnerable to poisoning attacks, in which adversaries controlling some of the collected data are able to induce a corrupted model. In this paper, we consider poisoning attacks where there is an adversary who has a particular target classifier in mind and hopes to induce a classifier close to that target by adding as few poisoning points as possible. We propose an efficient poisoning attack based on online convex optimization. Unlike previous model-targeted poisoning attacks, our attack comes with provable convergence to any achievable target classifier. The distance from the induced classifier to the target classifier is inversely proportional to the square root of the number of poisoning points. We also provide a certified lower bound on the minimum number of poisoning points needed to achieve a given target classifier. We report on experiments showing our attack has performance that is similar to or better than the state-of-the-art attacks in terms of attack success rate and distance to the target model, while providing the advantages of provable convergence, and the efficiency benefits associated with being an online attack that can determine near-optimal poisoning points incrementally.
The dynamic, real-time, and accurate inference of model parameters from empirical data is of great importance in many scientific and engineering disciplines that use computational models (such as a digital twin) for the analysis and prediction of complex physical processes. However, fast and accurate inference for processes with large and high dimensional datasets cannot easily be achieved with state-of-the-art methods under noisy real-world conditions. The primary reason is that the inference of model parameters with traditional techniques based on optimisation or sampling often suffers from computational and statistical challenges, resulting in a trade-off between accuracy and deployment time. In this paper, we propose a novel framework for inference of model parameters based on reinforcement learning. The contribution of the paper is twofold: 1) We reformulate the inference problem as a tracking problem with the objective of learning a policy that forces the response of the physics-based model to follow the observations; 2) We propose the constrained Lyapunov-based actor-critic (CLAC) algorithm to enable the robust and accurate inference of physics-based model parameters in real time under noisy real-world conditions. The proposed methodology is demonstrated and evaluated on two model-based diagnostics test cases utilizing two different physics-based models of turbofan engines. The performance of the methodology is compared to that of two alternative approaches: a state update method (unscented Kalman filter) and a supervised end-to-end mapping with deep neural networks. The experimental results demonstrate that the proposed methodology outperforms all other tested methods in terms of speed and robustness, with high inference accuracy.
Robotic automation has the potential to assist human surgeons in performing suturing tasks in microsurgery, and in order to do so a robot must be able to guide a needle with sub-millimeter precision through soft tissue. This paper presents a robotic suturing system that uses 3D optical coherence tomography (OCT) system for imaging feedback. Calibration of the robot-OCT and robot-needle transforms, wound detection, keypoint identification, and path planning are all performed automatically. The calibration method handles pose uncertainty when the needle is grasped using a variant of iterative closest points. The path planner uses the identified wound shape to calculate needle entry and exit points to yield an evenly-matched wound shape after closure. Experiments on tissue phantoms and animal tissue demonstrate that the system can pass a suture needle through wounds with 0.27 mm overall accuracy in achieving the planned entry and exit points.
Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. GNN is a deep learning based method that learns a node representation by combining specific nodes and the structural/topological information of a graph. However, like other deep models, explaining the effectiveness of GNN models is a challenging task because of the complex nonlinear transformations made over the iterations. In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method. GraphLIME is a generic GNN-model explanation framework that learns a nonlinear interpretable model locally in the subgraph of the node being explained. More specifically, to explain a node, we generate a nonlinear interpretable model from its $N$-hop neighborhood and then compute the K most representative features as the explanations of its prediction using HSIC Lasso. Through experiments on two real-world datasets, the explanations of GraphLIME are found to be of extraordinary degree and more descriptive in comparison to the existing explanation methods.
We investigate the problem of reconstructing n-by-n structured matrix signal X via convex programming, where each column xj is a vector of s-sparsity and all columns have the same l1-norm. The regularizer in use is matrix norm |||X|||1=maxj|xj|1.The contribution in this paper has two parts. The first part is about conditions for stability and robustness in signal reconstruction via solving the convex programming from noise-free or noisy measurements.We establish uniform sufficient conditions which are very close to necessary conditions and non-uniform conditions are also discussed. Similar as the traditional compressive sensing theory for reconstructing vector signals, a related RIP condition is established. In addition, stronger conditions are investigated to guarantee the reconstructed signal's support stability, sign stability and approximation-error robustness. The second part is to establish upper and lower bounds on number of measurements for robust reconstruction in noise. We take the convex geometric approach in random measurement setting and one of the critical ingredients in this approach is to estimate the related widths bounds in case of Gaussian and non-Gaussian distributions. These bounds are explicitly controlled by signal's structural parameters r and s which determine matrix signal's column-wise sparsity and l1-column-flatness respectively.