Classification and probability estimation have broad applications in modern machine learning and data science applications, including biology, medicine, engineering, and computer science. The recent development of a class of weighted Support Vector Machines (wSVMs) has shown great values in robustly predicting the class probability and classification for various problems with high accuracy. The current framework is based on the $\ell^2$-norm regularized binary wSVMs optimization problem, which only works with dense features and has poor performance at sparse features with redundant noise in most real applications. The sparse learning process requires a prescreen of the important variables for each binary wSVMs for accurately estimating pairwise conditional probability. In this paper, we proposed novel wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems. We developed efficient algorithms for effective variable selection for solving either the $\ell^1$-norm or elastic net regularized binary wSVMs optimization problems. The binary class probability is then estimated either by the $\ell^2$-norm regularized wSVMs framework with selected variables or by elastic net regularized wSVMs directly. The two-step approach of $\ell^1$-norm followed by $\ell^2$-norm wSVMs show a great advantage in both automatic variable selection and reliable probability estimators with the most efficient time. The elastic net regularized wSVMs offer the best performance in terms of variable selection and probability estimation with the additional advantage of variable grouping in the compensation of more computation time for high dimensional problems. The proposed wSVMs-based sparse learning methods have wide applications and can be further extended to $K$-class problems through ensemble learning.
In this paper, we introduce the problem of rewriting finite formal languages using syntactic macros such that the rewriting is minimal in size. We present polynomial-time algorithms to solve variants of this problem and show their correctness. To demonstrate the practical relevance of the proposed problems and the feasibility and effectiveness of our algorithms in practice, we apply these to biomedical ontologies authored in OWL. We find that such rewritings can significantly reduce the size of ontologies by capturing repeated expressions with macros. In addition to offering valuable assistance in enhancing ontology quality and comprehension, the presented approach introduces a systematic way of analysing and evaluating features of rewriting systems (including syntactic macros, templates, or other forms of rewriting rules) in terms of their influence on computational problems.
This paper presents a novel vision-based proprioception approach for a soft robotic finger capable of estimating and reconstructing tactile interactions in terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compact in-finger camera captures high-framerate images of the finger's deformation during contact, extracting crucial tactile data in real time. We present a method of the volumetric discretized model of the soft finger and use the geometry constraints captured by the camera to find the optimal estimation of the deformed shape. The approach is benchmarked with a motion-tracking system with sparse markers and a haptic device with dense measurements. Both results show state-of-the-art accuracies, with a median error of 1.96 mm for overall body deformation, corresponding to 2.1$\%$ of the finger's length. More importantly, the state estimation is robust in both on-land and underwater environments as we demonstrate its usage for underwater object shape sensing. This combination of passive adaptation and real-time tactile sensing paves the way for amphibious robotic grasping applications.
Humans' internal states play a key role in human-machine interaction, leading to the rise of human state estimation as a prominent field. Compared to swift state changes such as surprise and irritation, modeling gradual states like trust and satisfaction are further challenged by label sparsity: long time-series signals are usually associated with a single label, making it difficult to identify the critical span of state shifts. Windowing has been one widely-used technique to enable localized analysis of long time-series data. However, the performance of downstream models can be sensitive to the window size, and determining the optimal window size demands domain expertise and extensive search. To address this challenge, we propose a Selective Windowing Attention Network (SWAN), which employs window prompts and masked attention transformation to enable the selection of attended intervals with flexible lengths. We evaluate SWAN on the task of trust prediction on a new multimodal driving simulation dataset. Experiments show that SWAN significantly outperforms an existing empirical window selection baseline and neural network baselines including CNN-LSTM and Transformer. Furthermore, it shows robustness across a wide span of windowing ranges, compared to the traditional windowing approach.
Low power deep learning accelerators on the speech processing enable real-time applications on edge devices. However, most of the existing accelerators suffer from high power consumption and focus on image applications only. This paper presents a low power accelerator for speech separation through algorithm and hardware optimizations. At the algorithm level, the model is compressed with structured sensitivity as well as unstructured pruning, and further quantized to the shifted 8-bit floating-point format instead of the 32-bit floating-point format. The computations with the zero kernel and zero activation values are skipped by decomposition of the dilated and transposed convolutions. At the hardware level, the compressed model is then supported by an architecture with eight independent multipliers and accumulators (MACs) with a simple zero-skipping hardware to take advantage of the activation sparsity and low power processing. The proposed approach reduces the model size by 95.44\% and computation complexity by 93.88\%. The final implementation with the TSMC 40 $nm$ process can achieve real-time speech separation and consumes 1.6 mW power when operated at 150 MHz. The normalized energy efficiency and area efficiency are 2.344 TOPS/W and 14.42 GOPS/mm$^2$, respectively.
We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.
Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Control (MPC). Specifically, we proposed a simplified rigid body dynamics (SRBD) model to take into account both humanoid and object dynamics for humanoid loco-manipulation. This linear dynamics model allows us to directly solve for ground reaction forces and moments via an MPC problem to achieve highly dynamic real-time control. Our proposed framework is highly versatile and generalizable. We introduce HECTOR (Humanoid for Enhanced ConTrol and Open-source Research) platform to demonstrate its effectiveness in hardware experiments. With the proposed framework, HECTOR can maintain exceptional balance during double-leg stance mode, even when subjected to external force disturbances to the body or foot location. In addition, it can execute 3-D dynamic walking on a variety of uneven terrains, including wet grassy surfaces, slopes, randomly placed wood slats, and stacked wood slats up to 6 cm high with the speed of 0.6 m/s. In addition, we have demonstrated dynamic humanoid loco-manipulation over uneven terrain, carrying 2.5 kg load. HECTOR simulations, along with the proposed control framework, are made available as an open-source project. (https://github.com/DRCL-USC/Hector_Simulation).
Surface electromyography (sEMG) is a widely used muscle activity monitoring technique. sEMG measures muscle activity through monopolar and bipolar, multi-electrode electrodes. The surface electrodes are placed on the surface of the skin above the target muscle and the received signal can be used to infer the state of the muscle - active, inactive or fatigued - which serves as vital information during neurological and orthopaedic rehabilitation. Additionally, the sEMG signal can also be used for the control of prostheses. sEMG requires contact with the participant's skin and is thus a potentially uncomfortable method for the measurement of muscle activity. Moreover, the setup procedure has been termed time-consuming by sEMG experts and is listed as one of the main barriers to the clinical employment of the technique. Previous studies have shown that architectural changes, particularly muscle deformation, can provide information about the activity of the muscle, providing an alternative to sEMG. In all these studies, the muscle deformation signal is acquired using ultrasound imaging, an approach known as sonomyography (SMG). Despite its advantages, such as improved spatial resolution, SMG is still a contact based approach. In this paper, we propose a non-contact muscle activity monitoring approach that measures the muscle deformation signal using a Frequency Modulated Continuous Wave (FMCW) mmWave radar which we call radiomyography (RMG). In future, this system will enable muscle activation to be measured in an unconstrained and less cumbersome manner for both the person conducting the test and the individual being tested.
Federated learning is a decentralized learning paradigm wherein a central server trains a global model iteratively by utilizing clients who possess a certain amount of private datasets. The challenge lies in the fact that the client side private data may not be identically and independently distributed, significantly impacting the accuracy of the global model. Existing methods commonly address the Non-IID challenge by focusing on optimization, client selection and data complement. However, most approaches tend to overlook the perspective of the private data itself due to privacy constraints.Intuitively, statistical distinctions among private data on the client side can help mitigate the Non-IID degree. Besides, the recent advancements in dataset condensation technology have inspired us to investigate its potential applicability in addressing Non-IID issues while maintaining privacy. Motivated by this, we propose DCFL which divides clients into groups by using the Centered Kernel Alignment (CKA) method, then uses dataset condensation methods with non-IID awareness to complete clients. The private data from clients within the same group is complementary and their condensed data is accessible to all clients in the group. Additionally, CKA-guided client selection strategy, filtering mechanisms, and data enhancement techniques are incorporated to efficiently and precisely utilize the condensed data, enhance model performance, and minimize communication time. Experimental results demonstrate that DCFL achieves competitive performance on popular federated learning benchmarks including MNIST, FashionMNIST, SVHN, and CIFAR-10 with existing FL protocol.