Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yimeng Liu

Hydra-Bench: A Benchmark for Multi-Modal Leaf Wetness Sensing

Jul 30, 2025

Yimeng Liu, Maolin Gan, Yidong Ren, Gen Li, Jingkai Lin, Younsuk Dong, Zhichao Cao

Abstract:Leaf wetness detection is a crucial task in agricultural monitoring, as it directly impacts the prediction and protection of plant diseases. However, existing sensing systems suffer from limitations in robustness, accuracy, and environmental resilience when applied to natural leaves under dynamic real-world conditions. To address these challenges, we introduce a new multi-modal dataset specifically designed for evaluating and advancing machine learning algorithms in leaf wetness detection. Our dataset comprises synchronized mmWave raw data, Synthetic Aperture Radar (SAR) images, and RGB images collected over six months from five diverse plant species in both controlled and outdoor field environments. We provide detailed benchmarks using the Hydra model, including comparisons against single modality baselines and multiple fusion strategies, as well as performance under varying scan distances. Additionally, our dataset can serve as a benchmark for future SAR imaging algorithm optimization, enabling a systematic evaluation of detection accuracy under diverse conditions.

Via

Access Paper or Ask Questions

A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Oct 19, 2024

Yuanhao Liu, Shuo Liu, Yimeng Liu, Jingwen Yang, Hong Qian

Figure 1 for A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Figure 2 for A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Figure 3 for A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Figure 4 for A Dual-Fusion Cognitive Diagnosis Framework for Open Student Learning Environments

Abstract:Cognitive diagnosis model (CDM) is a fundamental and upstream component in intelligent education. It aims to infer students' mastery levels based on historical response logs. However, existing CDMs usually follow the ID-based embedding paradigm, which could often diminish the effectiveness of CDMs in open student learning environments. This is mainly because they can hardly directly infer new students' mastery levels or utilize new exercises or knowledge without retraining. Textual semantic information, due to its unified feature space and easy accessibility, can help alleviate this issue. Unfortunately, directly incorporating semantic information may not benefit CDMs, since it does not capture response-relevant features and thus discards the individual characteristics of each student. To this end, this paper proposes a dual-fusion cognitive diagnosis framework (DFCD) to address the challenge of aligning two different modalities, i.e., textual semantic features and response-relevant features. Specifically, in DFCD, we first propose the exercise-refiner and concept-refiner to make the exercises and knowledge concepts more coherent and reasonable via large language models. Then, DFCD encodes the refined features using text embedding models to obtain the semantic information. For response-related features, we propose a novel response matrix to fully incorporate the information within the response logs. Finally, DFCD designs a dual-fusion module to merge the two modal features. The ultimate representations possess the capability of inference in open student learning environments and can be also plugged in existing CDMs. Extensive experiments across real-world datasets show that DFCD achieves superior performance by integrating different modalities and strong adaptability in open student learning environments.

Via

Access Paper or Ask Questions

LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality

Oct 09, 2024

Kojiro Takeyama, Yimeng Liu, Misha Sra

Abstract:Understanding human locomotion is crucial for AI agents such as robots, particularly in complex indoor home environments. Modeling human trajectories in these spaces requires insight into how individuals maneuver around physical obstacles and manage social navigation dynamics. These dynamics include subtle behaviors influenced by proxemics - the social use of space, such as stepping aside to allow others to pass or choosing longer routes to avoid collisions. Previous research has developed datasets of human motion in indoor scenes, but these are often limited in scale and lack the nuanced social navigation dynamics common in home environments. To address this, we present LocoVR, a dataset of 7000+ two-person trajectories captured in virtual reality from over 130 different indoor home environments. LocoVR provides full body pose data and precise spatial information, along with rich examples of socially-motivated movement behaviors. For example, the dataset captures instances of individuals navigating around each other in narrow spaces, adjusting paths to respect personal boundaries in living areas, and coordinating movements in high-traffic zones like entryways and kitchens. Our evaluation shows that LocoVR significantly enhances model performance in three practical indoor tasks utilizing human trajectories, and demonstrates predicting socially-aware navigation patterns in home environments.

Via

Access Paper or Ask Questions

SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

Aug 05, 2024

Sherry X. Chen, Alex Christopher Lim, Yimeng Liu, Pradeep Sen, Misha Sra

Figure 1 for SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

Figure 2 for SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

Figure 3 for SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

Figure 4 for SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

Abstract:Virtual try-on (VTO) applications aim to improve the online shopping experience by allowing users to preview garments, before making purchase decisions. However, many VTO tools fail to consider the crucial relationship between a garment's size and the user's body size, often employing a one-size-fits-all approach when visualizing a clothing item. This results in poor size recommendations and purchase decisions leading to increased return rates. To address this limitation, we introduce SiCo, an online VTO system, where users can upload images of themselves and visualize how different sizes of clothing would look on their body to help make better-informed purchase decisions. Our user study shows SiCo's superiority over baseline VTO. The results indicate that our approach significantly enhances user ability to gauge the appearance of outfits on their bodies and boosts their confidence in selecting clothing sizes that match desired goals. Based on our evaluation, we believe our VTO design has the potential to reduce return rates and enhance the online clothes shopping experience. Our code is available at https://github.com/SherryXTChen/SiCo.

Via

Access Paper or Ask Questions

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Jun 26, 2021

Pulkit Tandon, Shubham Chandak, Pat Pataranutaporn, Yimeng Liu, Anesu M. Mapuranga, Pattie Maes, Tsachy Weissman, Misha Sra

Figure 1 for Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Figure 2 for Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Figure 3 for Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Figure 4 for Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Abstract:Video represents the majority of internet traffic today leading to a continuous technological arms race between generating higher quality content, transmitting larger file sizes and supporting network infrastructure. Adding to this is the recent COVID-19 pandemic fueled surge in the use of video conferencing tools. Since videos take up substantial bandwidth (~100 Kbps to few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. In this work, we present a novel video compression pipeline, called Txt2Vid, which substantially reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n=242) in an online study. The code for this work is available at https://github.com/tpulkit/txt2vid.git.

* 8 pages, 5 figures, 1 table

Via

Access Paper or Ask Questions