Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
Power quality (PQ) events are recorded by PQ meters whenever anomalous events are detected on the power grid. Using neural networks with machine learning can aid in accurately classifying the recorded waveforms and help power system engineers diagnose and rectify the root causes of problems. However, many of the waveforms captured during a disturbance in the power system need to be labeled for supervised learning, leaving a large number of data recordings for engineers to process manually or go unseen. This paper presents an autoencoder and K-means clustering-based unsupervised technique that can be used to cluster PQ events into categories like sag, interruption, transients, normal, and harmonic distortion to enable filtering of anomalous waveforms from recurring or normal waveforms. The method is demonstrated using three-phase, field-obtained voltage waveforms recorded in a distribution grid. First, a convolutional autoencoder compresses the input signals into a set of lower feature dimensions which, after further processing, is passed to the K-means algorithm to identify data clusters. Using a small, labeled dataset, numerical labels are then assigned to events based on a cosine similarity analysis. Finally, the study analyzes the clusters using the t-distributed stochastic neighbor embedding (t-SNE) visualization tool, demonstrating that the technique can help investigate a large number of captured events in a quick manner.
In this work, we present the first dataset, \dataset, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes $\sim$4K emails annotated with $\sim$9K event instances. To understand the task challenges, we conducted a series of experiments comparing two commonly-seen lines of approaches for event extraction, i.e., sequence labeling and generative end-to-end extraction (including few-shot GPT-3.5). Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more investigations in this domain-specific event extraction task in the future.\footnote{The source code and dataset can be obtained from \url{https://github.com/salokr/Email-Event-Extraction}.
Recovering full 3D shapes from partial observations is a challenging task that has been extensively addressed in the computer vision community. Many deep learning methods tackle this problem by training 3D shape generation networks to learn a prior over the full 3D shapes. In this training regime, the methods expect the inputs to be in a fixed canonical form, without which they fail to learn a valid prior over the 3D shapes. We propose SCARP, a model that performs Shape Completion in ARbitrary Poses. Given a partial pointcloud of an object, SCARP learns a disentangled feature representation of pose and shape by relying on rotationally equivariant pose features and geometric shape features trained using a multi-tasking objective. Unlike existing methods that depend on an external canonicalization, SCARP performs canonicalization, pose estimation, and shape completion in a single network, improving the performance by 45% over the existing baselines. In this work, we use SCARP for improving grasp proposals on tabletop objects. By completing partial tabletop objects directly in their observed poses, SCARP enables a SOTA grasp proposal network improve their proposals by 71.2% on partial shapes. Project page: https://bipashasen.github.io/scarp
Determining accurate bird's eye view (BEV) positions of objects and tracks in a scene is vital for various perception tasks including object interactions mapping, scenario extraction etc., however, the level of supervision required to accomplish that is extremely challenging to procure. We propose a light-weight, weakly supervised method to estimate 3D position of objects by jointly learning to regress the 2D object detections and scene's depth prediction in a single feed-forward pass of a network. Our proposed method extends a center-point based single-shot object detector, and introduces a novel object representation where each object is modeled as a BEV point spatio-temporally, without the need of any 3D or BEV annotations for training and LiDAR data at query time. The approach leverages readily available 2D object supervision along with LiDAR point clouds (used only during training) to jointly train a single network, that learns to predict 2D object detection alongside the whole scene's depth, to spatio-temporally model object tracks as points in BEV. The proposed method is computationally over $\sim$10x efficient compared to recent SOTA approaches while achieving comparable accuracies on KITTI tracking benchmark.
Modular Active Cell Robots (MACROs) is a design approach in which a large number of linear actuators and passive compliant joints are assembled to create an active structure with a repeating unit cell. Such a mesh-like robotic structure can be actuated to achieve large deformation and shape-change. In this two-part paper, we use Finite Element Analysis (FEA) to model the deformation behavior of different MACRO mesh topologies and evaluate their passive and active mechanical characteristics. In part 1, we presented the passive stiffness characteristics of different MACRO meshes. Now, in this part 2 of the paper, we investigate the active strain characteristics of planar MACRO meshes. Using FEA, we quantify and compare the strains generated for the specific choice of MACRO mesh topology and further for the specific choice of actuators actuated in that particular mesh. We simulate a series of actuation modes that are based on the angular orientation of the actuators within the mesh and show that such actuation modes result in deformation that is independent of the size of the mesh. We also show that there exists a subset of such actuation modes that spans the range of deformation behavior. Finally, we compare the actuation effort required to actuate different MACRO meshes and show that the actuation effort is related to the nodal connectivity of the mesh.
Modular Active Cell Robots (MACROs) are a design paradigm for modular robotic hardware that uses only two components, namely actuators and passive compliant joints. Under the MACRO approach, a large number of actuators and joints are connected to create mesh-like cellular robotic structures that can be actuated to achieve large deformation and shape-change. In this two-part paper, we study the importance of different possible mesh topologies within the MACRO framework. Regular and semi-regular tilings of the plane are used as the candidate mesh topologies and simulated using Finite Element Analysis (FEA). In Part 1, we use FEA to evaluate their passive stiffness characteristics. Using a strain energy method, the homogenized material properties (Young's modulus, shear modulus, and Poisson's ratio) of different mesh topologies are computed and compared. The results show that the stiffnesses increase with increasing nodal connectivity and that stretching-dominated topologies have higher stiffness compared to bending-dominated ones. We also investigate the role of relative actuator-node stiffness on the overall mesh characteristics. This analysis shows that the stiffness of stretching-dominated topologies scale directly with their cross-section area whereas the bending-dominated ones do not have such a direct relationship.
Due to its high delay resolution, the ultra-wideband (UWB) technique has been widely adopted for fine-grained indoor localization. Instead of active positioning, multi-static UWB radar-based passive human tracking is explored using commercial off-the-shelf (COTS) devices. To extract the time-of-flight (ToF) reflected by the moving person, channel impulse responses (CIR) and the corresponding variances are used to train the convolutional neural networks (CNN) model. Particle filter algorithm is adopted to track the moving person based on the extracted ToFs of all pairs of links. Experimental results show that the proposed CIR- and variance-based CNN models achieve 30.12-cm and 29.04-cm root-mean-square errors (RMSEs), respectively. Especially, the variance-based CNN model is robust to the scenario changing and promising for practical applications.
Relation Extraction (RE) from tables is the task of identifying relations between pairs of columns of a table. Generally, RE models for this task require labelled tables for training. These labelled tables can also be generated artificially from a Knowledge Graph (KG), which makes the cost to acquire them much lower in comparison to manual annotations. However, unlike real tables, these synthetic tables lack associated metadata, such as, column-headers, captions, etc; this is because synthetic tables are created out of KGs that do not store such metadata. Meanwhile, previous works have shown that metadata is important for accurate RE from tables. To address this issue, we propose methods to artificially create some of this metadata for synthetic tables. Afterward, we experiment with a BERT-based model, in line with recently published works, that takes as input a combination of proposed artificial metadata and table content. Our empirical results show that this leads to an improvement of 9\%-45\% in F1 score, in absolute terms, over 2 tabular datasets.