We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be pruned via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications.
Web application firewall (WAF) plays an integral role nowadays to protect web applications from various malicious injection attacks such as SQL injection, XML injection, and PHP injection, to name a few. However, given the evolving sophistication of injection attacks and the increasing complexity of tuning a WAF, it is challenging to ensure that the WAF is free of injection vulnerabilities such that it will block all malicious injection attacks without wrongly affecting the legitimate message. Automatically testing the WAF is, therefore, a timely and essential task. In this paper, we propose DaNuoYi, an automatic injection testing tool that simultaneously generates test inputs for multiple types of injection attacks on a WAF. Our basic idea derives from the cross-lingual translation in the natural language processing domain. In particular, test inputs for different types of injection attacks are syntactically different but may be semantically similar. Sharing semantic knowledge across multiple programming languages can thus stimulate the generation of more sophisticated test inputs and discovering injection vulnerabilities of the WAF that are otherwise difficult to find. To this end, in DaNuoYi, we train several injection translation models by using multi-task learning that translates the test inputs between any pair of injection attacks. The model is then used by a novel multi-task evolutionary algorithm to co-evolve test inputs for different types of injection attacks facilitated by a shared mating pool and domain-specific mutation operators at each generation. We conduct experiments on three real-world open-source WAFs and six types of injection attacks, the results reveal that DaNuoYi generates up to 3.8x and 5.78x more valid test inputs (i.e., bypassing the underlying WAF) than its state-of-the-art single-task counterparts and the context-free grammar-based injection construction.
Aspect-based Sentiment Classification (ABSC) is a challenging sub-task of traditional sentiment analysis. Due to the difficulty of handling potential correlations among sentiment polarities of multiple aspects, i.e., sentiment dependency, recent popular works tend to exploit syntactic information guiding sentiment dependency parsing. However, syntax information (e.g., syntactic dependency trees) usually occupies expensive computational resources in terms of the operation of the adjacent matrix. Instead, we define the consecutive aspects with the same sentiment as the sentiment cluster in the case that we find that most sentiment dependency occurs between adjacent aspects. Motivated by this finding, we propose the sentiment patterns (SP) to guide the model dependency learning. Thereafter, we introduce the local sentiment aggregating (LSA) mechanism to focus on learning the sentiment dependency in the sentiment cluster. The LSA is more efficient than existing dependency tree-based models due to the absence of additional dependency matrix constructing and modeling. Furthermore, we propose differential weighting for aggregation window building to measure the importance of sentiment dependency. Experiments on four public datasets show that our models achieve state-of-the-art performance with especially improvement on learning sentiment cluster.
We propose the first general and scalable framework to design certifiable algorithms for robust geometric perception in the presence of outliers. Our first contribution is to show that estimation using common robust costs, such as truncated least squares (TLS), maximum consensus, Geman-McClure, Tukey's biweight, among others, can be reformulated as polynomial optimization problems (POPs). By focusing on the TLS cost, our second contribution is to exploit sparsity in the POP and propose a sparse semidefinite programming (SDP) relaxation that is much smaller than the standard Lasserre's hierarchy while preserving exactness, i.e., the SDP recovers the optimizer of the nonconvex POP with an optimality certificate. Our third contribution is to solve the SDP relaxations at an unprecedented scale and accuracy by presenting STRIDE, a solver that blends global descent on the convex SDP with fast local search on the nonconvex POP. Our fourth contribution is an evaluation of the proposed framework on six geometric perception problems including single and multiple rotation averaging, point cloud and mesh registration, absolute pose estimation, and category-level object pose and shape estimation. Our experiments demonstrate that (i) our sparse SDP relaxation is exact with up to 60%-90% outliers across applications; (ii) while still being far from real-time, STRIDE is up to 100 times faster than existing SDP solvers on medium-scale problems, and is the only solver that can solve large-scale SDPs with hundreds of thousands of constraints to high accuracy; (iii) STRIDE provides a safeguard to existing fast heuristics for robust estimation (e.g., RANSAC or Graduated Non-Convexity), i.e., it certifies global optimality if the heuristic estimates are optimal, or detects and allows escaping local optima when the heuristic estimates are suboptimal.
Face recognition has made significant progress in recent years due to deep convolutional neural networks (CNN). In many face recognition (FR) scenarios, face images are acquired from a sequence with huge intra-variations. These intra-variations, which are mainly affected by the low-quality face images, cause instability of recognition performance. Previous works have focused on ad-hoc methods to select frames from a video or use face image quality assessment (FIQA) methods, which consider only a particular or combination of several distortions. In this work, we present an efficient non-reference image quality assessment for FR that directly links image quality assessment (IQA) and FR. More specifically, we propose a new measurement to evaluate image quality without any reference. Based on the proposed quality measurement, we propose a deep Tiny Face Quality network (tinyFQnet) to learn a quality prediction function from data. We evaluate the proposed method for different powerful FR models on two classical video-based (or template-based) benchmark: IJB-B and YTF. Extensive experiments show that, although the tinyFQnet is much smaller than the others, the proposed method outperforms state-of-the-art quality assessment methods in terms of effectiveness and efficiency.
We consider solving high-order semidefinite programming (SDP) relaxations of nonconvex polynomial optimization problems (POPs) that admit rank-one optimal solutions. Existing approaches, which solve the SDP independently from the POP, either cannot scale to large problems or suffer from slow convergence due to the typical degeneracy of such SDPs. We propose a new algorithmic framework, called SpecTrahedral pRoximal gradIent Descent along vErtices (STRIDE), that blends fast local search on the nonconvex POP with global descent on the convex SDP. Specifically, STRIDE follows a globally convergent trajectory driven by a proximal gradient method (PGM) for solving the SDP, while simultaneously probing long, but safeguarded, rank-one "strides", generated by fast nonlinear programming algorithms on the POP, to seek rapid descent. We prove STRIDE has global convergence. To solve the subproblem of projecting a given point onto the feasible set of the SDP, we reformulate the projection step as a continuously differentiable unconstrained optimization and apply a limited-memory BFGS method to achieve both scalability and accuracy. We conduct numerical experiments on solving second-order SDP relaxations arising from two important applications in machine learning and computer vision. STRIDE dominates a diverse set of five existing SDP solvers and is the only solver that can solve degenerate rank-one SDPs to high accuracy (e.g., KKT residuals below 1e-9), even in the presence of millions of equality constraints.
We consider a category-level perception problem, where one is given 3D sensor data picturing an object of a given category (e.g. a car), and has to reconstruct the pose and shape of the object despite intra-class variability (i.e. different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular, we show that rotation estimation can be decoupled from the estimation of the object translation and shape, and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation, and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver, hence making it robust to a large number of misdetections. Towards this goal, we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers, we also develop the first graph-theoretic formulation to prune outliers in category-level perception, which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70%-90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset, we combine our solver with a deep-learned keypoint detector, and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape datasets.
We study the problem of aligning two sets of 3D geometric primitives given known correspondences. Our first contribution is to show that this primitive alignment framework unifies five perception problems including point cloud registration, primitive (mesh) registration, category-level 3D registration, absolution pose estimation (APE), and category-level APE. Our second contribution is to propose DynAMical Pose estimation (DAMP), the first general and practical algorithm to solve primitive alignment problem by simulating rigid body dynamics arising from virtual springs and damping, where the springs span the shortest distances between corresponding primitives. Our third contribution is to apply DAMP to the five perception problems in simulated and real datasets and demonstrate (i) DAMP always converges to the globally optimal solution in the first three problems with 3D-3D correspondences; (ii) although DAMP sometimes converges to suboptimal solutions in the last two problems with 2D-3D correspondences, with a simple scheme for escaping local minima, DAMP almost always succeeds. Our last contribution is to demystify the surprising empirical performance of DAMP and formally prove a global convergence result in the case of point cloud registration by charactering local stability of the equilibrium points of the underlying dynamical system.