Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Zhu

Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

May 04, 2025

Yu Zhu, Zehang Richard Li

Figure 1 for Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

Figure 2 for Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

Figure 3 for Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

Figure 4 for Bayesian Federated Cause-of-Death Classification and Quantification Under Distribution Shift

Abstract:In regions lacking medically certified causes of death, verbal autopsy (VA) is a critical and widely used tool to ascertain the cause of death through interviews with caregivers. Data collected by VAs are often analyzed using probabilistic algorithms. The performance of these algorithms often degrades due to distributional shift across populations. Most existing VA algorithms rely on centralized training, requiring full access to training data for joint modeling. This is often infeasible due to privacy and logistical constraints. In this paper, we propose a novel Bayesian Federated Learning (BFL) framework that avoids data sharing across multiple training sources. Our method enables reliable individual-level cause-of-death classification and population-level quantification of cause-specific mortality fractions (CSMFs), in a target domain with limited or no local labeled data. The proposed framework is modular, computationally efficient, and compatible with a wide range of existing VA algorithms as candidate models, facilitating flexible deployment in real-world mortality surveillance systems. We validate the performance of BFL through extensive experiments on two real-world VA datasets under varying levels of distribution shift. Our results show that BFL significantly outperforms the base models built on a single domain and achieves comparable or better performance compared to joint modeling.

* 11 figures

Via

Access Paper or Ask Questions

Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views

Apr 29, 2025

Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang

Abstract:We present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view geometric optimization. We propose Sparse2DGS, an MVS-initialized Gaussian Splatting pipeline for complete and accurate reconstruction. Our key insight is to incorporate the geometric-prioritized enhancement schemes, allowing for direct and robust geometric learning under ill-posed conditions. Sparse2DGS outperforms existing methods by notable margins while being ${2}\times$ faster than the NeRF-based fine-tuning approach.

* CVPR 2025

Via

Access Paper or Ask Questions

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Feb 27, 2025

Yuhao Li, Mirana Claire Angel, Salman Khan, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Figure 1 for C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Figure 2 for C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Figure 3 for C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Figure 4 for C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation

Abstract:Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. However, the existing trajectory-based approaches are usually limited to only generating the motion trajectory of the controlled object and ignoring the dynamic interactions between the controlled object and its surroundings. To address this limitation, we propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Instead of directly generating the motion of some objects, our C-Drag first performs object perception and then reasons the dynamic interactions between different objects according to the given motion control of the objects. Specifically, our method includes an object perception module and a Chain-of-Thought-based motion reasoning module. The object perception module employs visual language models to capture the position and category information of various objects within the image. The Chain-of-Thought-based motion reasoning module takes this information as input and conducts a stage-wise reasoning process to generate motion trajectories for each of the affected objects, which are subsequently fed to the diffusion model for video synthesis. Furthermore, we introduce a new video object interaction (VOI) dataset to evaluate the generation quality of motion controlled video generation methods. Our VOI dataset contains three typical types of interactions and provides the motion trajectories of objects that can be used for accurate performance evaluation. Experimental results show that C-Drag achieves promising performance across multiple metrics, excelling in object motion control. Our benchmark, codes, and models will be available at https://github.com/WesLee88524/C-Drag-Official-Repo.

Via

Access Paper or Ask Questions

In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Jan 21, 2025

Yu Zhu, Wenqi Jiang, Gustavo Alonso

Figure 1 for In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Figure 2 for In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Figure 3 for In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Figure 4 for In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

Abstract:Keeping ML-based recommender models up-to-date as data drifts and evolves is essential to maintain accuracy. As a result, online data preprocessing plays an increasingly important role in serving recommender systems. Existing solutions employ multiple CPU workers to saturate the input bandwidth of a single training node. Such an approach results in high deployment costs and energy consumption. For instance, a recent report from industrial deployments shows that data storage and ingestion pipelines can account for over 60\% of the power consumption in a recommender system. In this paper, we tackle the issue from a hardware perspective by introducing Piper, a flexible and network-attached accelerator that executes data loading and preprocessing pipelines in a streaming fashion. As part of the design, we define MiniPipe, the smallest pipeline unit enabling multi-pipeline implementation by executing various data preprocessing tasks across the single board, giving Piper the ability to be reconfigured at runtime. Our results, using publicly released commercial pipelines, show that Piper, prototyped on a power-efficient FPGA, achieves a 39$\sim$105$\times$ speedup over a server-grade, 128-core CPU and 3$\sim$17$\times$ speedup over GPUs like RTX 3090 and A100 in multiple pipelines. The experimental analysis demonstrates that Piper provides advantages in both latency and energy efficiency for preprocessing tasks in recommender systems, providing an alternative design point for systems that today are in very high demand.

Via

Access Paper or Ask Questions

UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Sep 30, 2024

Cheng Zhang, Dong Gong, Jiumei He, Yu Zhu, Jinqiu Sun, Yanning Zhang

Figure 1 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 2 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 3 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Figure 4 for UIR-LoRA: Achieving Universal Image Restoration through Multiple Low-Rank Adaptation

Abstract:Existing unified methods typically treat multi-degradation image restoration as a multi-task learning problem. Despite performing effectively compared to single degradation restoration methods, they overlook the utilization of commonalities and specificities within multi-task restoration, thereby impeding the model's performance. Inspired by the success of deep generative models and fine-tuning techniques, we proposed a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning. Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks using low-rank adaptation. Additionally, we introduce a LoRA composing strategy based on the degradation similarity, which adaptively combines trained LoRAs and enables our model to be applicable for mixed degradation restoration. Extensive experiments on multiple and mixed degradations demonstrate that the proposed universal image restoration method not only achieves higher fidelity and perceptual image quality but also has better generalization ability than other unified image restoration models. Our code is available at https://github.com/Justones/UIR-LoRA.

Via

Access Paper or Ask Questions

Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Aug 03, 2024

Qiucen Wu, Tian Lin, Xianghao Yu, Yu Zhu, Robert Schober

Figure 1 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 2 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 3 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Figure 4 for Beamforming for PIN Diode-Based IRS-Assisted Systems Under a Phase Shift-Dependent Power Consumption Model

Abstract:Intelligent reflecting surfaces (IRSs) have been regarded as a promising enabler for future wireless communication systems. In the literature, IRSs have been considered power-free or assumed to have constant power consumption. However, recent experimental results have shown that for positive-intrinsic-negative (PIN) diode-based IRSs, the power consumption dynamically changes with the phase shift configuration. This phase shift-dependent power consumption (PS-DPC) introduces a challenging power allocation problem between base station (BS) and IRS. To tackle this issue, in this paper, we investigate a rate maximization problem for IRS-assisted systems under a practical PS-DPC model. For the single-user case, we propose a generalized Benders decomposition-based beamforming method to maximize the achievable rate while satisfying a total system power consumption constraint. Moreover, we propose a low-complexity beamforming design, where the powers allocated to BS and IRS are optimized offline based on statistical channel state information. Furthermore, for the multi-user case, we solve an equivalent weighted mean square error minimization problem with two different joint power allocation and phase shift optimization methods. Simulation results indicate that compared to baseline schemes, our proposed methods can flexibly optimize the power allocation between BS and IRS, thus achieving better performance. The optimized power allocation strategy strongly depends on the system power budget. When the system power budget is high, the PS-DPC is not the dominant factor in the system power consumption, allowing the IRS to turn on as many PIN diodes as needed to achieve high beamforming quality. When the system power budget is limited, however, more power tends to be allocated to the BS to enhance the transmit power, resulting in a lower beamforming quality at the IRS due to the reduced PS-DPC budget.

Via

Access Paper or Ask Questions

Multi-Granularity Language-Guided Multi-Object Tracking

Jun 07, 2024

Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Figure 1 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 2 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 3 for Multi-Granularity Language-Guided Multi-Object Tracking

Figure 4 for Multi-Granularity Language-Guided Multi-Object Tracking

Abstract:Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations. To develop LG-MOT, we annotate existing MOT datasets with scene-and instance-level language descriptions. We then encode both instance-and scene-level language information into high-dimensional embeddings, which are utilized to guide the visual features during training. At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions. Extensive experiments on three benchmarks, MOT17, DanceTrack and SportsMOT, reveal the merits of the proposed contributions leading to state-of-the-art performance. On the DanceTrack test set, our LG-MOT achieves an absolute gain of 2.2\% in terms of target object association (IDF1 score), compared to the baseline using only visual features. Further, our LG-MOT exhibits strong cross-domain generalizability. The dataset and code will be available at ~\url{https://github.com/WesLee88524/LG-MOT}.

Via

Access Paper or Ask Questions

Reflected Flow Matching

May 26, 2024

Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

Abstract:Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural samples, e.g., oversaturated images, due to both flow matching error and simulation error. To address this, we add a boundary constraint term to CNFs, which leads to reflected CNFs that keep trajectories within the constrained domains. We propose reflected flow matching (RFM) to train the velocity model in reflected CNFs by matching the conditional velocity fields in a simulation-free manner, similar to the vanilla FM. Moreover, the analytical form of conditional velocity fields in RFM avoids potentially biased approximations, making it superior to existing score-based generative models on constrained domains. We demonstrate that RFM achieves comparable or better results on standard image benchmarks and produces high-quality class-conditioned samples under high guidance weight.

* ICML 2024 camera-ready

Via

Access Paper or Ask Questions

Sakuga-42M Dataset: Scaling Up Cartoon Research

May 13, 2024

Zhenglin Pan, Yu Zhu, Yuxuan Mu

Figure 1 for Sakuga-42M Dataset: Scaling Up Cartoon Research

Figure 2 for Sakuga-42M Dataset: Scaling Up Cartoon Research

Figure 3 for Sakuga-42M Dataset: Scaling Up Cartoon Research

Figure 4 for Sakuga-42M Dataset: Scaling Up Cartoon Research

Abstract:Hand-drawn cartoon animation employs sketches and flat-color segments to create the illusion of motion. While recent advancements like CLIP, SVD, and Sora show impressive results in understanding and generating natural video by scaling large models with extensive datasets, they are not as effective for cartoons. Through our empirical experiments, we argue that this ineffectiveness stems from a notable bias in hand-drawn cartoons that diverges from the distribution of natural videos. Can we harness the success of the scaling paradigm to benefit cartoon research? Unfortunately, until now, there has not been a sizable cartoon dataset available for exploration. In this research, we propose the Sakuga-42M Dataset, the first large-scale cartoon animation dataset. Sakuga-42M comprises 42 million keyframes covering various artistic styles, regions, and years, with comprehensive semantic annotations including video-text description pairs, anime tags, content taxonomies, etc. We pioneer the benefits of such a large-scale cartoon dataset on comprehension and generation tasks by finetuning contemporary foundation models like Video CLIP, Video Mamba, and SVD, achieving outstanding performance on cartoon-related tasks. Our motivation is to introduce large-scaling to cartoon research and foster generalization and robustness in future cartoon applications. Dataset, Code, and Pretrained Models will be publicly available.

* Arxiv Pre-print. Work in Progress

Via

Access Paper or Ask Questions

Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation

May 02, 2024

Yu Zhu, Qiang Yang, Li Xu

Abstract:Cell image segmentation is usually implemented using fully supervised deep learning methods, which heavily rely on extensive annotated training data. Yet, due to the complexity of cell morphology and the requirement for specialized knowledge, pixel-level annotation of cell images has become a highly labor-intensive task. To address the above problems, we propose an active learning framework for cell segmentation using bounding box annotations, which greatly reduces the data annotation cost of cell segmentation algorithms. First, we generate a box-supervised learning method (denoted as YOLO-SAM) by combining the YOLOv8 detector with the Segment Anything Model (SAM), which effectively reduces the complexity of data annotation. Furthermore, it is integrated into an active learning framework that employs the MC DropBlock method to train the segmentation model with fewer box-annotated samples. Extensive experiments demonstrate that our model saves more than ninety percent of data annotation time compared to mask-supervised deep learning methods.

Via

Access Paper or Ask Questions