Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Chen

Department of Statistics, University of Michigan, Ann Arbor, Michigan Institute for Data Science, University of Michigan, Ann Arbor

DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Jun 05, 2024

Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, Yipeng Liu

Figure 1 for DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Figure 2 for DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Figure 3 for DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Figure 4 for DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Abstract:Cooperation between temporal convolutional networks (TCN) and graph convolutional networks (GCN) as a processing module has shown promising results in skeleton-based video anomaly detection (SVAD). However, to maintain a lightweight model with low computational and storage complexity, shallow GCN and TCN blocks are constrained by small receptive fields and a lack of cross-dimension interaction capture. To tackle this limitation, we propose a lightweight module called the Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in spatio-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops. Furthermore, the proposed Dual Attention Normalizing Flow (DA-Flow) integrates the DAM as a post-processing unit after GCN within the normalizing flow framework. Simulations show that the proposed model is robust against noise and negative samples. Experimental results show that DA-Flow reaches competitive or better performance than the existing state-of-the-art (SOTA) methods in terms of the micro AUC metric with the fewest number of parameters. Moreover, we found that even without training, simply using random projection without dimensionality reduction on skeleton data enables substantial anomaly detection capabilities.

Via

Access Paper or Ask Questions

Tensor Polynomial Additive Model

Jun 05, 2024

Yang Chen, Ce Zhu, Jiani Liu, Yipeng Liu

Figure 1 for Tensor Polynomial Additive Model

Figure 2 for Tensor Polynomial Additive Model

Figure 3 for Tensor Polynomial Additive Model

Figure 4 for Tensor Polynomial Additive Model

Abstract:Additive models can be used for interpretable machine learning for their clarity and simplicity. However, In the classical models for high-order data, the vectorization operation disrupts the data structure, which may lead to degenerated accuracy and increased computational complexity. To deal with these problems, we propose the tensor polynomial addition model (TPAM). It retains the multidimensional structure information of high-order inputs with tensor representation. The model parameter compression is achieved using a hierarchical and low-order symmetric tensor approximation. In this way, complex high-order feature interactions can be captured with fewer parameters. Moreover, The TPAM preserves the inherent interpretability of additive models, facilitating transparent decision-making and the extraction of meaningful feature values. Additionally, leveraging TPAM's transparency and ability to handle higher-order features, it is used as a post-processing module for other interpretation models by introducing two variants for class activation maps. Experimental results on a series of datasets demonstrate that TPAM can enhance accuracy by up to 30\%, and compression rate by up to 5 times, while maintaining a good interpretability.

Via

Access Paper or Ask Questions

Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

May 31, 2024

Yang Chen, Tian He, Junfeng Fu, Ling Wang, Jingcai Guo, Hong Cheng

Figure 1 for Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Figure 2 for Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Figure 3 for Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Figure 4 for Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning

Abstract:Supervised and self-supervised learning are two main training paradigms for skeleton-based human action recognition. However, the former one-hot classification requires labor-intensive predefined action categories annotations, while the latter involves skeleton transformations (e.g., cropping) in the pretext tasks that may impair the skeleton structure. To address these challenges, we introduce a novel skeleton-based training framework (C$^2$VL) based on Cross-modal Contrastive learning that uses the progressive distillation to learn task-agnostic human skeleton action representation from the Vision-Language knowledge prompts. Specifically, we establish the vision-language action concept space through vision-language knowledge prompts generated by pre-trained large multimodal models (LMMs), which enrich the fine-grained details that the skeleton action space lacks. Moreover, we propose the intra-modal self-similarity and inter-modal cross-consistency softened targets in the cross-modal contrastive process to progressively control and guide the degree of pulling vision-language knowledge prompts and corresponding skeletons closer. These soft instance discrimination and self-knowledge distillation strategies contribute to the learning of better skeleton-based action representations from the noisy skeleton-vision-language pairs. During the inference phase, our method requires only the skeleton data as the input for action recognition and no longer for vision-language prompts. Extensive experiments show that our method achieves state-of-the-art results on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. The code will be available in the future.

Via

Access Paper or Ask Questions

HOIN: High-Order Implicit Neural Representations

Apr 23, 2024

Yang Chen, Ruituo Wu, Yipeng Liu, Ce Zhu

Figure 1 for HOIN: High-Order Implicit Neural Representations

Figure 2 for HOIN: High-Order Implicit Neural Representations

Figure 3 for HOIN: High-Order Implicit Neural Representations

Figure 4 for HOIN: High-Order Implicit Neural Representations

Abstract:Implicit neural representations (INR) suffer from worsening spectral bias, which results in overly smooth solutions to the inverse problem. To deal with this problem, we propose a universal framework for processing inverse problems called \textbf{High-Order Implicit Neural Representations (HOIN)}. By refining the traditional cascade structure to foster high-order interactions among features, HOIN enhances the model's expressive power and mitigates spectral bias through its neural tangent kernel's (NTK) strong diagonal properties, accelerating and optimizing inverse problem resolution. By analyzing the model's expression space, high-order derivatives, and the NTK matrix, we theoretically validate the feasibility of HOIN. HOIN realizes 1 to 3 dB improvements in most inverse problems, establishing a new state-of-the-art recovery quality and training efficiency, thus providing a new general paradigm for INR and paving the way for it to solve the inverse problem.

Via

Access Paper or Ask Questions

Microwave photonic short-time Fourier transform based on stabilized period-one nonlinear laser dynamics and stimulated Brillouin scattering

Apr 17, 2024

Sunan Zhang, Taixia Shi, Lizhong Jiang, Yang Chen

Abstract:A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by stimulated Brillouin scattering (SBS) to achieve the frequency-to-time mapping of microwave signals and the final STFT. By comparing the experimental results with and without optoelectronic feedback, it is found that the time-frequency diagram of the signal under test (SUT) obtained by STFT is clearer and more regular, and the frequency of the SUT measured in each frequency-sweep period is more accurate. The mean absolute error is reduced by 50% under the optimal filter bandwidth.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition

Apr 15, 2024

Yang Chen, Jingcai Guo, Tian He, Ling Wang

Abstract:Skeleton-based zero-shot action recognition aims to recognize unknown human actions based on the learned priors of the known skeleton-based actions and a semantic descriptor space shared by both known and unknown categories. However, previous works focus on establishing the bridges between the known skeleton representation space and semantic descriptions space at the coarse-grained level for recognizing unknown action categories, ignoring the fine-grained alignment of these two spaces, resulting in suboptimal performance in distinguishing high-similarity action categories. To address these challenges, we propose a novel method via Side information and dual-prompts learning for skeleton-based zero-shot action recognition (STAR) at the fine-grained level. Specifically, 1) we decompose the skeleton into several parts based on its topology structure and introduce the side information concerning multi-part descriptions of human body movements for alignment between the skeleton and the semantic space at the fine-grained level; 2) we design the visual-attribute and semantic-part prompts to improve the intra-class compactness within the skeleton space and inter-class separability within the semantic space, respectively, to distinguish the high-similarity actions. Extensive experiments show that our method achieves state-of-the-art performance in ZSL and GZSL settings on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Apr 08, 2024

Taixia Shi, Yang Chen, Jianping Yao

Figure 1 for Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Figure 2 for Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Figure 3 for Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Figure 4 for Seamlessly merging radar ranging/imaging, wireless communications, and spectrum sensing, for 6G empowered by microwave photonics

Abstract:Integration of radar, wireless communications, and spectrum sensing is being investigated for 6G with an increased spectral efficiency. Microwave photonics (MWP), a technique that combines microwave engineering and photonic technology to take advantage of the wide bandwidth offered by photonics for microwave signal generation and processing is considered an effective solution for the implementation of the integration. In this paper, an MWP-assisted joint radar, wireless communications, and spectrum sensing (JRCSS) system that enables precise perception of the surrounding physical and electromagnetic environments while maintaining high-speed data communication is proposed and demonstrated. Communication signals and frequency-sweep signals are merged in the optical domain to achieve high-speed radar ranging and imaging, high-data-rate wireless communications, and wideband spectrum sensing. In an experimental demonstration, a JRCSS system supporting radar ranging with a measurement error within $\pm$ 4 cm, two-dimensional imaging with a resolution of 25 $\times$ 24.7 mm, wireless communications with a data rate of 2 Gbaud, and spectrum sensing with a frequency measurement error within $\pm$ 10 MHz in a 6-GHz bandwidth, is demonstrated.

* 18 pages, 10 figures

Via

Access Paper or Ask Questions

A multi-stage semi-supervised learning for ankle fracture classification on CT images

Mar 29, 2024

Hongzhi Liu, Guicheng Li, Jiacheng Nie, Hui Tang, Chunfeng Yang, Qianjin Feng, Hailin Xu, Yang Chen

$Figure 1 for A multi-stage semi-supervised learning for ankle fracture classification on CT images$

$Figure 2 for A multi-stage semi-supervised learning for ankle fracture classification on CT images$

$Figure 3 for A multi-stage semi-supervised learning for ankle fracture classification on CT images$

$Figure 4 for A multi-stage semi-supervised learning for ankle fracture classification on CT images$

Abstract:Because of the complicated mechanism of ankle injury, it is very difficult to diagnose ankle fracture in clinic. In order to simplify the process of fracture diagnosis, an automatic diagnosis model of ankle fracture was proposed. Firstly, a tibia-fibula segmentation network is proposed for the joint tibiofibular region of the ankle joint, and the corresponding segmentation dataset is established on the basis of fracture data. Secondly, the image registration method is used to register the bone segmentation mask with the normal bone mask. Finally, a semi-supervised classifier is constructed to make full use of a large number of unlabeled data to classify ankle fractures. Experiments show that the proposed method can segment fractures with fracture lines accurately and has better performance than the general method. At the same time, this method is superior to classification network in several indexes.

Via

Access Paper or Ask Questions

RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Mar 25, 2024

Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao

Figure 1 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 2 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 3 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 4 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Abstract:Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, there is a limited number of cone-beam projections available for image reconstruction. Consequently, the 4D CBCT images are covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ ordinary network models, neglecting the intrinsic structural prior within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images.Specifically, we find that streak artifacts exhibit a periodic rotational motion along with the patient's respiration. This unique motion pattern inspires us to distinguish the artifacts from the desired anatomical structures in the spatiotemporal domain. Thereafter, we propose a spatiotemporal neural network named RSTAR-Net with separable and circular convolutions for Rotational Streak Artifact Reduction. The specially designed model effectively encodes dynamic image features, facilitating the recovery of 4D CBCT images. Moreover, RSTAR-Net is also lightweight and computationally efficient. Extensive experiments substantiate the effectiveness of our proposed method, and RSTAR-Net shows superior performance to comparison methods.

Via

Access Paper or Ask Questions

VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Mar 25, 2024

Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei

Figure 1 for VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Figure 2 for VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Figure 3 for VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Figure 4 for VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

Abstract:Recent innovations on text-to-3D generation have featured Score Distillation Sampling (SDS), which enables the zero-shot learning of implicit 3D models (NeRF) by directly distilling prior knowledge from 2D diffusion models. However, current SDS-based models still struggle with intricate text prompts and commonly result in distorted 3D models with unrealistic textures or cross-view inconsistency issues. In this work, we introduce a novel Visual Prompt-guided text-to-3D diffusion model (VP3D) that explicitly unleashes the visual appearance knowledge in 2D visual prompt to boost text-to-3D generation. Instead of solely supervising SDS with text prompt, VP3D first capitalizes on 2D diffusion model to generate a high-quality image from input text, which subsequently acts as visual prompt to strengthen SDS optimization with explicit visual appearance. Meanwhile, we couple the SDS optimization with additional differentiable reward function that encourages rendering images of 3D models to better visually align with 2D visual prompt and semantically match with text prompt. Through extensive experiments, we show that the 2D Visual Prompt in our VP3D significantly eases the learning of visual appearance of 3D models and thus leads to higher visual fidelity with more detailed textures. It is also appealing in view that when replacing the self-generating visual prompt with a given reference image, VP3D is able to trigger a new task of stylized text-to-3D generation. Our project page is available at https://vp3d-cvpr24.github.io.

* CVPR 2024; Project page: https://vp3d-cvpr24.github.io

Via

Access Paper or Ask Questions