Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Huang

Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence

Jan 31, 2025

Lindy Gan, Yifan Huang, Xiaoyang Gao, Jiaming Tan, Fujun Zhao, Tao Yang

Abstract:This study proposes an innovative multimodal fusion model based on a teacher-student architecture to enhance the accuracy of depression classification. Our designed model addresses the limitations of traditional methods in feature fusion and modality weight allocation by introducing multi-head attention mechanisms and weighted multimodal transfer learning. Leveraging the DAIC-WOZ dataset, the student fusion model, guided by textual and auditory teacher models, achieves significant improvements in classification accuracy. Ablation experiments demonstrate that the proposed model attains an F1 score of 99. 1% on the test set, significantly outperforming unimodal and conventional approaches. Our method effectively captures the complementarity between textual and audio features while dynamically adjusting the contributions of the teacher models to enhance generalization capabilities. The experimental results highlight the robustness and adaptability of the proposed framework in handling complex multimodal data. This research provides a novel technical framework for multimodal large model learning in depression analysis, offering new insights into addressing the limitations of existing methods in modality fusion and feature extraction.

* 21 pages,7 figures.1 table

Via

Access Paper or Ask Questions

Optical Wireless Communications: Enabling the Next Generation Network of Networks

Dec 21, 2024

Aravindh Krishnamoorthy, Hossein Safi, Othman Younus, Hossein Kazemi, Isaac N. O. Osahon, Mingqing Liu, Yi Liu, Sina Babadi, Rizwana Ahmad, Asim Ihsan(+13 more)

Figure 1 for Optical Wireless Communications: Enabling the Next Generation Network of Networks

Figure 2 for Optical Wireless Communications: Enabling the Next Generation Network of Networks

Figure 3 for Optical Wireless Communications: Enabling the Next Generation Network of Networks

Figure 4 for Optical Wireless Communications: Enabling the Next Generation Network of Networks

Abstract:Optical wireless communication (OWC) is a promising technology anticipated to play a key role in the next-generation network of networks. To this end, this paper details the potential of OWC, as a complementary technology to traditional radio frequency communications, in enhancing networking capabilities beyond conventional terrestrial networks. Several usage scenarios and the current state of development are presented. Furthermore, a summary of existing challenges and opportunities are provided. Emerging technologies aimed at further enhancing future OWC capabilities are introduced. Additionally, value-added OWC-based technologies that leverage the unique properties of light are discussed, including applications such as positioning and gesture recognition. The paper concludes with the reflection that OWC provides unique functionalities that can play a crucial role in building convergent and resilient future network of networks.

* This work has been submitted to the IEEE for possible publication. 15pp, 16 figures, and one table

Via

Access Paper or Ask Questions

Flexible and Scalable Deep Dendritic Spiking Neural Networks with Multiple Nonlinear Branching

Dec 09, 2024

Yifan Huang, Wei Fang, Zhengyu Ma, Guoqi Li, Yonghong Tian

Abstract:Recent advances in spiking neural networks (SNNs) have a predominant focus on network architectures, while relatively little attention has been paid to the underlying neuron model. The point neuron models, a cornerstone of deep SNNs, pose a bottleneck on the network-level expressivity since they depict somatic dynamics only. In contrast, the multi-compartment models in neuroscience offer remarkable expressivity by introducing dendritic morphology and dynamics, but remain underexplored in deep learning due to their unaffordable computational cost and inflexibility. To combine the advantages of both sides for a flexible, efficient yet more powerful model, we propose the dendritic spiking neuron (DendSN) incorporating multiple dendritic branches with nonlinear dynamics. Compared to the point spiking neurons, DendSN exhibits significantly higher expressivity. DendSN's flexibility enables its seamless integration into diverse deep SNN architectures. To accelerate dendritic SNNs (DendSNNs), we parallelize dendritic state updates across time steps, and develop Triton kernels for GPU-level acceleration. As a result, we can construct large-scale DendSNNs with depth comparable to their point SNN counterparts. Next, we comprehensively evaluate DendSNNs' performance on various demanding tasks. By modulating dendritic branch strengths using a context signal, catastrophic forgetting of DendSNNs is substantially mitigated. Moreover, DendSNNs demonstrate enhanced robustness against noise and adversarial attacks compared to point SNNs, and excel in few-shot learning settings. Our work firstly demonstrates the possibility of training bio-plausible dendritic SNNs with depths and scales comparable to traditional point SNNs, and reveals superior expressivity and robustness of reduced dendritic neuron models in deep learning, thereby offering a fresh perspective on advancing neural network design.

Via

Access Paper or Ask Questions

Comprehensive Online Training and Deployment for Spiking Neural Networks

Oct 10, 2024

Zecheng Hao, Yifan Huang, Zijie Xu, Zhaofei Yu, Tiejun Huang

Abstract:Spiking Neural Networks (SNNs) are considered to have enormous potential in the future development of Artificial Intelligence (AI) due to their brain-inspired and energy-efficient properties. In the current supervised learning domain of SNNs, compared to vanilla Spatial-Temporal Back-propagation (STBP) training, online training can effectively overcome the risk of GPU memory explosion and has received widespread academic attention. However, the current proposed online training methods cannot tackle the inseparability problem of temporal dependent gradients and merely aim to optimize the training memory, resulting in no performance advantages compared to the STBP training models in the inference phase. To address the aforementioned challenges, we propose Efficient Multi-Precision Firing (EM-PF) model, which is a family of advanced spiking models based on floating-point spikes and binary synaptic weights. We point out that EM-PF model can effectively separate temporal gradients and achieve full-stage optimization towards computation speed and memory footprint. Experimental results have demonstrated that EM-PF model can be flexibly combined with various techniques including random back-propagation, parallel computation and channel attention mechanism, to achieve state-of-the-art performance with extremely low computational overhead in the field of online learning.

Via

Access Paper or Ask Questions

SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

Jan 03, 2024

Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang

Abstract:Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.

* WACV 2024
* 10 pages, 7 figures, accept WACV2024

Via

Access Paper or Ask Questions

Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Apr 04, 2023

Yunhao Chen, Yunjie Zhu, Zihui Yan, Jianlu Shen, Zhen Ren, Yifan Huang

Figure 1 for Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Figure 2 for Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Figure 3 for Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Figure 4 for Data Augmentation for Environmental Sound Classification Using Diffusion Probabilistic Model with Top-k Selection Discriminator

Abstract:Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.

Via

Access Paper or Ask Questions

Triadic Temporal Exponential Random Graph Models (TTERGM)

Nov 29, 2022

Yifan Huang, Clayton Barham, Eric Page, Pamela K Douglas

Abstract:Temporal exponential random graph models (TERGM) are powerful statistical models that can be used to infer the temporal pattern of edge formation and elimination in complex networks (e.g., social networks). TERGMs can also be used in a generative capacity to predict longitudinal time series data in these evolving graphs. However, parameter estimation within this framework fails to capture many real-world properties of social networks, including: triadic relationships, small world characteristics, and social learning theories which could be used to constrain the probabilistic estimation of dyadic covariates. Here, we propose triadic temporal exponential random graph models (TTERGM) to fill this void, which includes these hierarchical network relationships within the graph model. We represent social network learning theory as an additional probability distribution that optimizes Markov chains in the graph vector space. The new parameters are then approximated via Monte Carlo maximum likelihood estimation. We show that our TTERGM model achieves improved fidelity and more accurate predictions compared to several benchmark methods on GitHub network data.

* 36th Conference on Neural Information Processing Systems (NeurIPS) 2022 Temporal Graph Learning Workshop

Via

Access Paper or Ask Questions

Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Jul 27, 2021

Shuhang Chen, Xiang Zhang, Xiang Shen, Yifan Huang, Yiwen Wang

Figure 1 for Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Figure 2 for Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Figure 3 for Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Figure 4 for Tracking Fast Neural Adaptation by Globally Adaptive Point Process Estimation for Brain-Machine Interface

Abstract:Brain-machine interfaces (BMIs) help the disabled restore body functions by translating neural activity into digital commands to control external devices. Neural adaptation, where the brain signals change in response to external stimuli or movements, plays an important role in BMIs. When subjects purely use neural activity to brain-control a prosthesis, some neurons will actively explore a new tuning property to accomplish the movement task. The prediction of this neural tuning property can help subjects adapt more efficiently to brain control and maintain good decoding performance. Existing prediction methods track the slow change of the tuning property in the manual control, which is not suitable for the fast neural adaptation in brain control. In order to identify the active neurons in brain control and track their tuning property changes, we propose a globally adaptive point process method (GaPP) to estimate the neural modulation state from spike trains, decompose the states into the hyper preferred direction and reconstruct the kinematics in a dual-model framework. We implement the method on real data from rats performing a two-lever discrimination task under manual control and brain control. The results show our method successfully predicts the neural modulation state and identifies the neurons that become active in brain control. Compared to existing methods, ours tracks the fast changes of the hyper preferred direction from manual control to brain control more accurately and efficiently and reconstructs the kinematics better and faster.

Via

Access Paper or Ask Questions