Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Wu

Clinical NLP with Attention-Based Deep Learning for Multi-Disease Prediction

Jul 02, 2025

Ting Xu, Xiaoxiao Deng, Xiandong Meng, Haifeng Yang, Yan Wu

Abstract:This paper addresses the challenges posed by the unstructured nature and high-dimensional semantic complexity of electronic health record texts. A deep learning method based on attention mechanisms is proposed to achieve unified modeling for information extraction and multi-label disease prediction. The study is conducted on the MIMIC-IV dataset. A Transformer-based architecture is used to perform representation learning over clinical text. Multi-layer self-attention mechanisms are employed to capture key medical entities and their contextual relationships. A Sigmoid-based multi-label classifier is then applied to predict multiple disease labels. The model incorporates a context-aware semantic alignment mechanism, enhancing its representational capacity in typical medical scenarios such as label co-occurrence and sparse information. To comprehensively evaluate model performance, a series of experiments were conducted, including baseline comparisons, hyperparameter sensitivity analysis, data perturbation studies, and noise injection tests. Results demonstrate that the proposed method consistently outperforms representative existing approaches across multiple performance metrics. The model maintains strong generalization under varying data scales, interference levels, and model depth configurations. The framework developed in this study offers an efficient algorithmic foundation for processing real-world clinical texts and presents practical significance for multi-label medical text modeling tasks.

Via

Access Paper or Ask Questions

Seed1.5-VL Technical Report

May 11, 2025

Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang(+187 more)

Figure 1 for Seed1.5-VL Technical Report

Figure 2 for Seed1.5-VL Technical Report

Figure 3 for Seed1.5-VL Technical Report

Figure 4 for Seed1.5-VL Technical Report

Abstract:We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluation suites, achieving the state-of-the-art performance on 38 out of 60 public benchmarks. Moreover, in agent-centric tasks such as GUI control and gameplay, Seed1.5-VL outperforms leading multimodal systems, including OpenAI CUA and Claude 3.7. Beyond visual and video understanding, it also demonstrates strong reasoning abilities, making it particularly effective for multimodal reasoning challenges such as visual puzzles. We believe these capabilities will empower broader applications across diverse tasks. In this report, we mainly provide a comprehensive review of our experiences in building Seed1.5-VL across model design, data construction, and training at various stages, hoping that this report can inspire further research. Seed1.5-VL is now accessible at https://www.volcengine.com/ (Volcano Engine Model ID: doubao-1-5-thinking-vision-pro-250428)

Via

Access Paper or Ask Questions

Fast2comm:Collaborative perception combined with prior knowledge

Apr 30, 2025

Zhengbin Zhang, Yan Wu, Hongkun Zhang

Abstract:Collaborative perception has the potential to significantly enhance perceptual accuracy through the sharing of complementary information among agents. However, real-world collaborative perception faces persistent challenges, particularly in balancing perception performance and bandwidth limitations, as well as coping with localization errors. To address these challenges, we propose Fast2comm, a prior knowledge-based collaborative perception framework. Specifically, (1)we propose a prior-supervised confidence feature generation method, that effectively distinguishes foreground from background by producing highly discriminative confidence features; (2)we propose GT Bounding Box-based spatial prior feature selection strategy to ensure that only the most informative prior-knowledge features are selected and shared, thereby minimizing background noise and optimizing bandwidth efficiency while enhancing adaptability to localization inaccuracies; (3)we decouple the feature fusion strategies between model training and testing phases, enabling dynamic bandwidth adaptation. To comprehensively validate our framework, we conduct extensive experiments on both real-world and simulated datasets. The results demonstrate the superior performance of our model and highlight the necessity of the proposed methods. Our code is available at https://github.com/Zhangzhengbin-TJ/Fast2comm.

* 8pages,8figures

Via

Access Paper or Ask Questions

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

Apr 17, 2025

Yan Wu, Korrawe Karunratanakul, Zhengyi Luo, Siyu Tang

Abstract:Generating natural and physically plausible character motion remains challenging, particularly for long-horizon control with diverse guidance signals. While prior work combines high-level diffusion-based motion planners with low-level physics controllers, these systems suffer from domain gaps that degrade motion quality and require task-specific fine-tuning. To tackle this problem, we introduce UniPhys, a diffusion-based behavior cloning framework that unifies motion planning and control into a single model. UniPhys enables flexible, expressive character motion conditioned on multi-modal inputs such as text, trajectories, and goals. To address accumulated prediction errors over long sequences, UniPhys is trained with the Diffusion Forcing paradigm, learning to denoise noisy motion histories and handle discrepancies introduced by the physics simulator. This design allows UniPhys to robustly generate physically plausible, long-horizon motions. Through guided sampling, UniPhys generalizes to a wide range of control signals, including unseen ones, without requiring task-specific fine-tuning. Experiments show that UniPhys outperforms prior methods in motion naturalness, generalization, and robustness across diverse control tasks.

* Project page: https://wuyan01.github.io/uniphys-project/

Via

Access Paper or Ask Questions

Physics-Aware Initialization Refinement in Code-Aided EM for Blind Channel Estimation

Apr 15, 2025

Chin-Hung Chen, Ivana Nikoloska, Wim van Houtum, Yan Wu, Alex Alvarado

Abstract:This paper addresses the well-known local maximum problem of the expectation-maximization (EM) algorithm in blind intersymbol interference (ISI) channel estimation. This problem primarily results from phase and shift ambiguity during initialization, which blind estimation is inherently unable to distinguish. We propose an effective initialization refinement algorithm that utilizes the decoder output as a model selection metric, incorporating a technique to detect phase and shift ambiguity. Our results show that the proposed algorithm significantly reduces the number of local maximum cases to nearly one-third for a 3-tap ISI channel under highly uncertain initial conditions. The improvement becomes more pronounced as initial errors increase and the channel memory grows. When used in a turbo equalizer, the proposed algorithm is required only in the first turbo iteration, which limits any complexity increase with subsequent iterations.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Modified Baum-Welch Algorithm for Joint Blind Channel Estimation and Turbo Equalization

Dec 10, 2024

Chin-Hung Chen, Boris Karanov, Ivana Nikoloska, Wim van Houtum, Yan Wu, Alex Alvarado

Figure 1 for Modified Baum-Welch Algorithm for Joint Blind Channel Estimation and Turbo Equalization

Figure 2 for Modified Baum-Welch Algorithm for Joint Blind Channel Estimation and Turbo Equalization

Figure 3 for Modified Baum-Welch Algorithm for Joint Blind Channel Estimation and Turbo Equalization

Figure 4 for Modified Baum-Welch Algorithm for Joint Blind Channel Estimation and Turbo Equalization

Abstract:Blind estimation of intersymbol interference channels based on the Baum-Welch (BW) algorithm, a specific implementation of the expectation-maximization (EM) algorithm for training hidden Markov models, is robust and does not require labeled data. However, it is known for its extensive computation cost, slow convergence, and frequently converges to a local maximum. In this paper, we modified the trellis structure of the BW algorithm by associating the channel parameters with two consecutive states. This modification enables us to reduce the number of required states by half while maintaining the same performance. Moreover, to improve the convergence rate and the estimation performance, we construct a joint turbo-BW-equalization system by exploiting the extrinsic information produced by the turbo decoder to refine the BW-based estimator at each EM iteration. Our experiments demonstrate that the joint system achieves convergence in just 4 EM iterations, which is 8 iterations less than a separate system design for a signal-to-noise ratio (SNR) of 6 dB. Additionally, the joint system provides improved estimation accuracy with a mean square error (MSE) of $10^{-4}$. We also identify scenarios where a joint design is not preferable, especially when the channel is noisy (e.g., SNR=2 dB) and the turbo decoder is unable to provide reliable extrinsic information for a BW-based estimator.

* 6 pages, 5 figures

Via

Access Paper or Ask Questions

Turbo Receiver Design with Joint Detection and Demapping for Coded Differential BPSK in Bursty Impulsive Noise Channels

Dec 10, 2024

Chin-Hung Chen, Boris Karanov, Wim van Houtom, Yan Wu, Alex Alvarado

Figure 1 for Turbo Receiver Design with Joint Detection and Demapping for Coded Differential BPSK in Bursty Impulsive Noise Channels

Figure 2 for Turbo Receiver Design with Joint Detection and Demapping for Coded Differential BPSK in Bursty Impulsive Noise Channels

Figure 3 for Turbo Receiver Design with Joint Detection and Demapping for Coded Differential BPSK in Bursty Impulsive Noise Channels

Figure 4 for Turbo Receiver Design with Joint Detection and Demapping for Coded Differential BPSK in Bursty Impulsive Noise Channels

Abstract:It has been recognized that the impulsive noise (IN) generated by power devices poses significant challenges to wireless receivers in practice. In this paper, we assess the achievable information rate (AIR) and the performance of practical turbo receiver designs for a well-established Markov-Middleton IN model. We utilize a commonly used commercial transmission setup consisting of a convolutional encoder, bit-level interleaver, and a differential binary phase-shift keying (DBPSK) symbol mapper. Firstly, we conduct a comprehensive assessment of the AIRs of the underlying channel model using DBPSK transmitted symbols across various channel conditions. Additionally, we introduce two robust turbo-like receiver designs. The first design features a separate IN detector and a turbo-demapper-decoder. The second design employs a joint approach, where the extrinsic information of both the detector and demapper is simultaneously updated, forming a turbo-detector-demapper-decoder structure. We show that the joint design consistently outperforms the separate design across all channel conditions, particularly in low AIR situations. However, the maximum performance gain for the channel conditions considered in this paper is merely 0.2 dB, and the joint system incurs significantly greater computational complexity, especially for a high number of turbo iterations. The performance of the two proposed turbo receiver designs is demonstrated to be close to the estimated AIR, with a performance gap dependent on the channel parameters.

* 12 pages, 13 figures

Via

Access Paper or Ask Questions

PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Aug 20, 2024

Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Błażej Osiński, Ridvan Eksi, Kun Zhang, Thore Graepel

Figure 1 for PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Figure 2 for PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Figure 3 for PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Figure 4 for PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

Abstract:We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mode or posterior collapse, and underscore the importance of rank metrics that assess the ordering of perturbations alongside traditional measures like RMSE. Our findings show that simple models can outperform more complex approaches. This benchmarking exercise sets new standards for model evaluation, supports robust model development, and advances the potential of these models to use high-throughput and high-content genetic and chemical screens for disease target discovery.

* 9 pages plus 19 pages supplementary material. Code is available at https://github.com/altoslabs/perturbench

Via

Access Paper or Ask Questions

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

Jul 30, 2024

En Yen Puang, Zechen Li, Chee Meng Chew, Shan Luo, Yan Wu

Abstract:Measuring grasp stability is an important skill for dexterous robot manipulation tasks, which can be inferred from haptic information with a tactile sensor. Control policies have to detect rotational displacement and slippage from tactile feedback, and determine a re-grasp strategy in term of location and force. Classic stable grasp task only trains control policies to solve for re-grasp location with objects of fixed center of gravity. In this work, we propose a revamped version of stable grasp task that optimises both re-grasp location and gripping force for objects with unknown and moving center of gravity. We tackle this task with a model-free, end-to-end Transformer-based reinforcement learning framework. We show that our approach is able to solve both objectives after training in both simulation and in a real-world setup with zero-shot transfer. We also provide performance analysis of different models to understand the dynamics of optimizing two opposing objectives.

* Accepted by ICIEA 2024

Via

Access Paper or Ask Questions

Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

May 17, 2024

Boris Karanov, Chin-Hung Chen, Yan Wu, Alex Young, Wim van Houtum

Figure 1 for Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

Figure 2 for Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

Figure 3 for Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

Figure 4 for Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive Noise

Abstract:We developed machine learning approaches for data-driven trellis-based soft symbol detection in coded transmission over intersymbol interference (ISI) channels in presence of bursty impulsive noise (IN), for example encountered in wireless digital broadcasting systems and vehicular communications. This enabled us to obtain optimized detectors based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm while circumventing the use of full channel state information (CSI) for computing likelihoods and trellis state transition probabilities. First, we extended the application of the neural network (NN)-aided BCJR, recently proposed for ISI channels with additive white Gaussian noise (AWGN). Although suitable for estimating likelihoods via labeling of transmission sequences, the BCJR-NN method does not provide a framework for learning the trellis state transitions. In addition to detection over the joint ISI and IN states we also focused on another scenario where trellis transitions are not trivial: detection for the ISI channel with AWGN with inaccurate knowledge of the channel memory at the receiver. Without access to the accurate state transition matrix, the BCJR- NN performance significantly degrades in both settings. To this end, we devised an alternative approach for data-driven BCJR detection based on the unsupervised learning of a hidden Markov model (HMM). The BCJR-HMM allowed us to optimize both the likelihood function and the state transition matrix without labeling. Moreover, we demonstrated the viability of a hybrid NN and HMM BCJR detection where NN is used for learning the likelihoods, while the state transitions are optimized via HMM. While reducing the required prior channel knowledge, the examined data-driven detectors with learned trellis state transitions achieve bit error rates close to the optimal full CSI-based BCJR, significantly outperforming detection with inaccurate CSI.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions