Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ziyu Wang

Jake

FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

Feb 28, 2025

Shiyong Zhang, Xuebo Zhang, Qianli Dong, Ziyu Wang, Haobo Xi, Jing Yuan

Figure 1 for FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

Figure 2 for FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

Figure 3 for FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

Figure 4 for FSMP: A Frontier-Sampling-Mixed Planner for Fast Autonomous Exploration of Complex and Large 3-D Environments

Abstract:In this paper, we propose a systematic framework for fast exploration of complex and large 3-D environments using micro aerial vehicles (MAVs). The key insight is the organic integration of the frontier-based and sampling-based strategies that can achieve rapid global exploration of the environment. Specifically, a field-of-view-based (FOV) frontier detector with the guarantee of completeness and soundness is devised for identifying 3-D map frontiers. Different from random sampling-based methods, the deterministic sampling technique is employed to build and maintain an incremental road map based on the recorded sensor FOVs and newly detected frontiers. With the resulting road map, we propose a two-stage path planner. First, it quickly computes the global optimal exploration path on the road map using the lazy evaluation strategy. Then, the best exploration path is smoothed for further improving the exploration efficiency. We validate the proposed method both in simulation and real-world experiments. The comparative results demonstrate the promising performance of our planner in terms of exploration efficiency, computational time, and explored volume.

* 13pages, 12 figures, accepted by IEEE Transactions on Instrumentation and Measurement

Via

Access Paper or Ask Questions

Skewed Memorization in Large Language Models: Quantification and Decomposition

Feb 03, 2025

Hao Li, Di Huang, Ziyu Wang, Amir M. Rahmani

Figure 1 for Skewed Memorization in Large Language Models: Quantification and Decomposition

Figure 2 for Skewed Memorization in Large Language Models: Quantification and Decomposition

Figure 3 for Skewed Memorization in Large Language Models: Quantification and Decomposition

Figure 4 for Skewed Memorization in Large Language Models: Quantification and Decomposition

Abstract:Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.

Via

Access Paper or Ask Questions

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos

Oct 04, 2024

Ziyu Wang, Shuangpeng Han, Mike Zheng Shou, Mengmi Zhang

Abstract:A prior represents a set of beliefs or assumptions about a system, aiding inference and decision-making. In this work, we introduce the challenge of unsupervised prior learning in pose estimation, where AI models learn pose priors of animate objects from videos in a self-supervised manner. These videos present objects performing various actions, providing crucial information about their keypoints and connectivity. While priors are effective in pose estimation, acquiring them can be difficult. We propose a novel method, named Pose Prior Learner (PPL), to learn general pose priors applicable to any object category. PPL uses a hierarchical memory to store compositional parts of prototypical poses, from which we distill a general pose prior. This prior enhances pose estimation accuracy through template transformation and image reconstruction. PPL learns meaningful pose priors without any additional human annotations or interventions, outperforming competitive baselines on both human and animal pose estimation datasets. Notably, our experimental results reveal the effectiveness of PPL using learnt priors for pose estimation on occluded images. Through iterative inference, PPL leverages priors to refine estimated poses, regressing them to any prototypical poses stored in memory. Our code, model, and data will be publicly available.

Via

Access Paper or Ask Questions

HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

Sep 28, 2024

Ziyu Wang, Hao Li, Di Huang, Amir M. Rahmani

Figure 1 for HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

Figure 2 for HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

Figure 3 for HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

Figure 4 for HealthQ: Unveiling Questioning Capabilities of LLM Chains in Healthcare Conversations

Abstract:In digital healthcare, large language models (LLMs) have primarily been utilized to enhance question-answering capabilities and improve patient interactions. However, effective patient care necessitates LLM chains that can actively gather information by posing relevant questions. This paper presents HealthQ, a novel framework designed to evaluate the questioning capabilities of LLM healthcare chains. We implemented several LLM chains, including Retrieval-Augmented Generation (RAG), Chain of Thought (CoT), and reflective chains, and introduced an LLM judge to assess the relevance and informativeness of the generated questions. To validate HealthQ, we employed traditional Natural Language Processing (NLP) metrics such as Recall-Oriented Understudy for Gisting Evaluation (ROUGE) and Named Entity Recognition (NER)-based set comparison, and constructed two custom datasets from public medical note datasets, ChatDoctor and MTS-Dialog. Our contributions are threefold: we provide the first comprehensive study on the questioning capabilities of LLMs in healthcare conversations, develop a novel dataset generation pipeline, and propose a detailed evaluation methodology.

Via

Access Paper or Ask Questions

Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Aug 27, 2024

Longshen Ou, Jingwei Zhao, Ziyu Wang, Gus Xia, Ye Wang

Figure 1 for Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Figure 2 for Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Figure 3 for Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Figure 4 for Unlocking Potential in Pre-Trained Music Language Models for Versatile Multi-Track Music Arrangement

Abstract:Large language models have shown significant capabilities across various domains, including symbolic music generation. However, leveraging these pre-trained models for controllable music arrangement tasks, each requiring different forms of musical information as control, remains a novel challenge. In this paper, we propose a unified sequence-to-sequence framework that enables the fine-tuning of a symbolic music language model for multiple multi-track arrangement tasks, including band arrangement, piano reduction, drum arrangement, and voice separation. Our experiments demonstrate that the proposed approach consistently achieves higher musical quality compared to task-specific baselines across all four tasks. Furthermore, through additional experiments on probing analysis, we show the pre-training phase equips the model with essential knowledge to understand musical conditions, which is hard to acquired solely through task-specific fine-tuning.

* Submitted to AAAI 2025

Via

Access Paper or Ask Questions

Foundation Models for Music: A Survey

Aug 27, 2024

Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton(+33 more)

Figure 1 for Foundation Models for Music: A Survey

Figure 2 for Foundation Models for Music: A Survey

Figure 3 for Foundation Models for Music: A Survey

Figure 4 for Foundation Models for Music: A Survey

Abstract:In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the significance of music in various industries and trace the evolution of AI in music. By delineating the modalities targeted by foundation models, we discover many of the music representations are underexplored in FM development. Then, emphasis is placed on the lack of versatility of previous methods on diverse music applications, along with the potential of FMs in music understanding, generation and medical application. By comprehensively exploring the details of the model pre-training paradigm, architectural choices, tokenisation, finetuning methodologies and controllability, we emphasise the important topics that should have been well explored, like instruction tuning and in-context learning, scaling law and emergent ability, as well as long-sequence modelling etc. A dedicated section presents insights into music agents, accompanied by a thorough analysis of datasets and evaluations essential for pre-training and downstream tasks. Finally, by underscoring the vital importance of ethical considerations, we advocate that following research on FM for music should focus more on such issues as interpretability, transparency, human responsibility, and copyright issues. The paper offers insights into future challenges and trends on FMs for music, aiming to shape the trajectory of human-AI collaboration in the music realm.

Via

Access Paper or Ask Questions

Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

Aug 11, 2024

Qianli Dong, Haobo Xi, Shiyong Zhang, Qingchen Bi, Tianyi Li, Ziyu Wang, Xuebo Zhang

Figure 1 for Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

Figure 2 for Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

Figure 3 for Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

Figure 4 for Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

Abstract:Efficient data transmission and reasonable task allocation are important to improve multi-robot exploration efficiency. However, most communication data types typically contain redundant information and thus require massive communication volume. Moreover, exploration-oriented task allocation is far from trivial and becomes even more challenging for resource-limited unmanned aerial vehicles (UAVs). In this paper, we propose a fast and communication-efficient multi-UAV exploration method for exploring large environments. We first design a multi-robot dynamic topological graph (MR-DTG) consisting of nodes representing the explored and exploring regions and edges connecting nodes. Supported by MR-DTG, our method achieves efficient communication by only transferring the necessary information required by exploration planning. To further improve the exploration efficiency, a hierarchical multi-UAV exploration method is devised using MR-DTG. Specifically, the \emph{graph Voronoi partition} is used to allocate MR-DTG's nodes to the closest UAVs, considering the actual motion cost, thus achieving reasonable task allocation. To our knowledge, this is the first work to address multi-UAV exploration using \emph{graph Voronoi partition}. The proposed method is compared with a state-of-the-art method in simulations. The results show that the proposed method is able to reduce the exploration time and communication volume by up to 38.3\% and 95.5\%, respectively. Finally, the effectiveness of our method is validated in the real-world experiment with 6 UAVs. We will release the source code to benefit the community.

* 8 pages, 8 figures, accepted by IEEE IROS2024, code see https://github.com/NKU-MobFly-Robotics/GVP-MREP

Via

Access Paper or Ask Questions

ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Aug 02, 2024

Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani

Figure 1 for ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Figure 2 for ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Figure 3 for ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Figure 4 for ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Abstract:While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.

Via

Access Paper or Ask Questions

Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Jul 11, 2024

Seyed Amir Hossein Aqajari, Ziyu Wang, Ali Tazarv, Sina Labbaf, Salar Jafarlou, Brenda Nguyen, Nikil Dutt, Marco Levorato, Amir M. Rahmani

Figure 1 for Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Figure 2 for Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Figure 3 for Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Figure 4 for Enhancing Performance and User Engagement in Everyday Stress Monitoring: A Context-Aware Active Reinforcement Learning Approach

Abstract:In today's fast-paced world, accurately monitoring stress levels is crucial. Sensor-based stress monitoring systems often need large datasets for training effective models. However, individual-specific models are necessary for personalized and interactive scenarios. Traditional methods like Ecological Momentary Assessments (EMAs) assess stress but struggle with efficient data collection without burdening users. The challenge is to timely send EMAs, especially during stress, balancing monitoring efficiency and user convenience. This paper introduces a novel context-aware active reinforcement learning (RL) algorithm for enhanced stress detection using Photoplethysmography (PPG) data from smartwatches and contextual data from smartphones. Our approach dynamically selects optimal times for deploying EMAs, utilizing the user's immediate context to maximize label accuracy and minimize intrusiveness. Initially, the study was executed in an offline environment to refine the label collection process, aiming to increase accuracy while reducing user burden. Later, we integrated a real-time label collection mechanism, transitioning to an online methodology. This shift resulted in an 11% improvement in stress detection efficiency. Incorporating contextual data improved model accuracy by 4%. Personalization studies indicated a 10% enhancement in AUC-ROC scores, demonstrating better stress level differentiation. This research marks a significant move towards personalized, context-driven real-time stress monitoring methods.

Via

Access Paper or Ask Questions

Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Jul 04, 2024

Yuxuan Wu, Ziyu Wang, Bhiksha Raj, Gus Xia

Figure 1 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 2 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 3 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Figure 4 for Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

Abstract:We contribute an unsupervised method that effectively learns from raw observation and disentangles its latent space into content and style representations. Unlike most disentanglement algorithms that rely on domain-specific labels and knowledge, our method is based on the insight of domain-general statistical differences between content and style -- content varies more among different fragments within a sample but maintains an invariant vocabulary across data samples, whereas style remains relatively invariant within a sample but exhibits more significant variation across different samples. We integrate such inductive bias into an encoder-decoder architecture and name our method after V3 (variance-versus-invariance). Experimental results show that V3 generalizes across two distinct domains in different modalities, music audio and images of written digits, successfully learning pitch-timbre and digit-color disentanglements, respectively. Also, the disentanglement robustness significantly outperforms baseline unsupervised methods and is even comparable to supervised counterparts. Furthermore, symbolic-level interpretability emerges in the learned codebook of content, forging a near one-to-one alignment between machine representation and human knowledge.

Via

Access Paper or Ask Questions