Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Jiang

Error-controlled non-additive interaction discovery in machine learning models

Aug 30, 2024

Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu

Figure 1 for Error-controlled non-additive interaction discovery in machine learning models

Figure 2 for Error-controlled non-additive interaction discovery in machine learning models

Figure 3 for Error-controlled non-additive interaction discovery in machine learning models

Figure 4 for Error-controlled non-additive interaction discovery in machine learning models

Abstract:Machine learning (ML) models are powerful tools for detecting complex patterns within data, yet their "black box" nature limits their interpretability, hindering their use in critical domains like healthcare and finance. To address this challenge, interpretable ML methods have been developed to explain how features influence model predictions. However, these methods often focus on univariate feature importance, overlooking the complex interactions between features that ML models are capable of capturing. Recognizing this limitation, recent efforts have aimed to extend these methods to discover feature interactions, but existing approaches struggle with robustness and error control, especially under data perturbations. In this study, we introduce Diamond, a novel method for trustworthy feature interaction discovery. Diamond uniquely integrates the model-X knockoffs framework to control the false discovery rate (FDR), ensuring that the proportion of falsely discovered interactions remains low. We further address the challenges of using off-the-shelf interaction importance measures by proposing a calibration procedure that refines these measures to maintain the desired FDR. Diamond's applicability spans a wide range of ML models, including deep neural networks, tree-based models, and factorization-based models. Our empirical evaluations on both simulated and real datasets across various biomedical studies demonstrate Diamond's utility in enabling more reliable data-driven scientific discoveries. This method represents a significant step forward in the deployment of ML models for scientific innovation and hypothesis generation.

Via

Access Paper or Ask Questions

Energy-Aware UAV-Enabled Target Tracking: Online Optimization with Location Constraints

Jul 17, 2024

Yifan Jiang, Qingqing Wu, Wen Chen, Hongxun Hui

Figure 1 for Energy-Aware UAV-Enabled Target Tracking: Online Optimization with Location Constraints

Figure 2 for Energy-Aware UAV-Enabled Target Tracking: Online Optimization with Location Constraints

Figure 3 for Energy-Aware UAV-Enabled Target Tracking: Online Optimization with Location Constraints

Abstract:For unmanned aerial vehicle (UAV) trajectory design, the total propulsion energy consumption and initial-final location constraints are practical factors to consider. However, unlike traditional offline designs, these two constraints are non-trivial to concurrently satisfy in online UAV trajectory designs for real-time target tracking, due to the undetermined information. To address this issue, we propose a novel online UAV trajectory optimization approach for the weighted sum-predicted posterior Cram\'er-Rao bound (PCRB) minimization, which guarantees the feasibility of satisfying the two mentioned constraints. Specifically, our approach designs the UAV trajectory by solving two subproblems: the candidate trajectory optimization problem and the energy-aware backup trajectory optimization problem. Then, an efficient solution to the candidate trajectory optimization problem is proposed based on Dinkelbach's transform and the Lasserre hierarchy, which achieves the global optimal solution under a given sufficient condition. The energy-aware backup trajectory optimization problem is solved by the successive convex approximation method. Numerical results show that our proposed UAV trajectory optimization approach significantly outperforms the benchmark regarding sensing performance and energy utilization flexibility.

Via

Access Paper or Ask Questions

Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

Jun 09, 2024

Ruiqi Liu, Shuang Zheng, Qingqing Wu, Yifan Jiang, Nan Zhang, Yuanwei Liu, Marco Di Renzo, and George C. Alexandropoulos

Figure 1 for Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

Figure 2 for Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

Figure 3 for Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

Figure 4 for Sustainable Wireless Networks via Reconfigurable Intelligent Surfaces (RISs): Overview of the ETSI ISG RIS

Abstract:Reconfigurable Intelligent Surfaces (RISs) are a novel form of ultra-low power devices that are capable to increase the communication data rates as well as the cell coverage in a cost- and energy-efficient way. This is attributed to their programmable operation that enables them to dynamically manipulate the wireless propagation environment, a feature that has lately inspired numerous research investigations and applications. To pave the way to the formal standardization of RISs, the European Telecommunications Standards Institute (ETSI) launched the Industry Specification Group (ISG) on the RIS technology in September 2021. This article provides a comprehensive overview of the status of the work conducted by the ETSI ISG RIS, covering typical deployment scenarios of reconfigurable metasurfaces, use cases and operating applications, requirements, emerging hardware architectures and operating modes, as well as the latest insights regarding future directions of RISs and the resulting smart wireless environments.

* 7 pages, 5 figures, submitted to an IEEE Magazine

Via

Access Paper or Ask Questions

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Apr 24, 2024

Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara

Figure 1 for MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Figure 2 for MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Figure 3 for MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Figure 4 for MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

Abstract:While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a specific task configuration (e.g., matrix). However, existing AVR benchmarks only considered a limited set of patterns (addition, conjunction), input shapes (rectangle, square), and task configurations (3 by 3 matrices). To evaluate MLLMs' reasoning abilities comprehensively, we introduce MARVEL, a multidimensional AVR benchmark with 770 puzzles composed of six core knowledge patterns, geometric and abstract shapes, and five different task configurations. To inspect whether the model accuracy is grounded in perception and reasoning, MARVEL complements the general AVR question with perception questions in a hierarchical evaluation framework. We conduct comprehensive experiments on MARVEL with nine representative MLLMs in zero-shot and few-shot settings. Our experiments reveal that all models show near-random performance on the AVR question, with significant performance gaps (40%) compared to humans across all patterns and task configurations. Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning. We release our entire code and dataset.

Via

Access Paper or Ask Questions

SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Apr 22, 2024

Yifan Jiang, Filip Ilievski, Kaixin Ma

Figure 1 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Figure 2 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Figure 3 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Figure 4 for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Abstract:While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the ability of the systems to reason laterally. We hope that the BRAINTEASER(S) subtasks and findings in this paper can stimulate future work on lateral thinking and robust reasoning by computational models.

Via

Access Paper or Ask Questions

TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

Mar 07, 2024

Zezheng Feng, Yifan Jiang, Hongjun Wang, Zipei Fan, Yuxin Ma, Shuang-Hua Yang, Huamin Qu, Xuan Song

Figure 1 for TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

Figure 2 for TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

Figure 3 for TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

Figure 4 for TrafPS: A Shapley-based Visual Analytics Approach to Interpret Traffic

Abstract:Recent achievements in deep learning (DL) have shown its potential for predicting traffic flows. Such predictions are beneficial for understanding the situation and making decisions in traffic control. However, most state-of-the-art DL models are considered "black boxes" with little to no transparency for end users with respect to the underlying mechanisms. Some previous work tried to "open the black boxes" and increase the interpretability of how predictions are generated. However, it still remains challenging to handle complex models on large-scale spatio-temporal data and discover salient spatial and temporal patterns that significantly influence traffic flows. To overcome the challenges, we present TrafPS, a visual analytics approach for interpreting traffic prediction outcomes to support decision-making in traffic management and urban planning. The measurements, region SHAP and trajectory SHAP, are proposed to quantify the impact of flow patterns on urban traffic at different levels. Based on the task requirement from the domain experts, we employ an interactive visual interface for multi-aspect exploration and analysis of significant flow patterns. Two real-world case studies demonstrate the effectiveness of TrafPS in identifying key routes and decision-making support for urban planning.

Via

Access Paper or Ask Questions

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Jan 22, 2024

Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara

Figure 1 for The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Figure 2 for The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Figure 3 for The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Figure 4 for The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

Abstract:While large language models (LLMs) are still being adopted to new domains and utilized in novel applications, we are experiencing an influx of the new generation of foundation models, namely multi-modal large language models (MLLMs). These models integrate verbal and visual information, opening new possibilities to demonstrate more complex reasoning abilities at the intersection of the two modalities. However, despite the revolutionizing prospect of MLLMs, our understanding of their reasoning abilities is limited. In this study, we assess the nonverbal abstract reasoning abilities of open-source and closed-source MLLMs using variations of Raven's Progressive Matrices. Our experiments expose the difficulty of solving such problems while showcasing the immense gap between open-source and closed-source models. We also reveal critical shortcomings with individual visual and textual modules, subjecting the models to low-performance ceilings. Finally, to improve MLLMs' performance, we experiment with various methods, such as Chain-of-Thought prompting, resulting in a significant (up to 100%) boost in performance.

* Code and datasets are available at https://github.com/kahrabian/mllm-nvar

Via

Access Paper or Ask Questions

UAV-enabled Integrated Sensing and Communication: Tracking Design and Optimization

Jan 15, 2024

Yifan Jiang, Qingqing Wu, Wen Chen, Kaitao Meng

Figure 1 for UAV-enabled Integrated Sensing and Communication: Tracking Design and Optimization

Figure 2 for UAV-enabled Integrated Sensing and Communication: Tracking Design and Optimization

Abstract:Integrated sensing and communications (ISAC) enabled by unmanned aerial vehicles (UAVs) is a promising technology to facilitate target tracking applications. In contrast to conventional UAV-based ISAC system designs that mainly focus on estimating the target position, the target velocity estimation also needs to be considered due to its crucial impacts on link maintenance and real-time response, which requires new designs on resource allocation and tracking scheme. In this paper, we propose an extended Kalman filtering-based tracking scheme for a UAV-enabled ISAC system where a UAV tracks a moving object and also communicates with a device attached to the object. Specifically, a weighted sum of predicted posterior Cram\'er-Rao bound (PCRB) for object relative position and velocity estimation is minimized by optimizing the UAV trajectory, where an efficient solution is obtained based on the successive convex approximation method. Furthermore, under a special case with the measurement mean square error (MSE), the optimal relative motion state is obtained and proved to keep a fixed elevation angle and zero relative velocity. Numerical results validate that the obtained solution to the predicted PCRB minimization can be approximated by the optimal relative motion state when predicted measurement MSE dominates the predicted PCRBs, as well as the effectiveness of the proposed tracking scheme. Moreover, three interesting trade-offs on system performance resulted from the fixed elevation angle are illustrated.

* 3 figures, 5 pages, submitted to IEEE Communications Letters

Via

Access Paper or Ask Questions

VASE: Object-Centric Appearance and Shape Manipulation of Real Videos

Jan 04, 2024

Elia Peruzzo, Vidit Goel, Dejia Xu, Xingqian Xu, Yifan Jiang, Zhangyang Wang, Humphrey Shi, Nicu Sebe

Abstract:Recently, several works tackled the video editing task fostered by the success of large-scale text-to-image generative models. However, most of these methods holistically edit the frame using the text, exploiting the prior given by foundation diffusion models and focusing on improving the temporal consistency across frames. In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object. We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control. We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities. Further details, code and examples are available on our project page: https://helia95.github.io/vase-website/

* Project Page https://helia95.github.io/vase-website/

Via

Access Paper or Ask Questions

Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

Dec 26, 2023

Liangchen Song, Liangliang Cao, Jiatao Gu, Yifan Jiang, Junsong Yuan, Hao Tang

Figure 1 for Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

Figure 2 for Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

Figure 3 for Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

Figure 4 for Efficient-NeRF2NeRF: Streamlining Text-Driven 3D Editing with Multiview Correspondence-Enhanced Diffusion Models

Abstract:The advancement of text-driven 3D content editing has been blessed by the progress from 2D generative diffusion models. However, a major obstacle hindering the widespread adoption of 3D content editing is its time-intensive processing. This challenge arises from the iterative and refining steps required to achieve consistent 3D outputs from 2D image-based generative models. Recent state-of-the-art methods typically require optimization time ranging from tens of minutes to several hours to edit a 3D scene using a single GPU. In this work, we propose that by incorporating correspondence regularization into diffusion models, the process of 3D editing can be significantly accelerated. This approach is inspired by the notion that the estimated samples during diffusion should be multiview-consistent during the diffusion generation process. By leveraging this multiview consistency, we can edit 3D content at a much faster speed. In most scenarios, our proposed technique brings a 10$\times$ speed-up compared to the baseline method and completes the editing of a 3D scene in 2 minutes with comparable quality.

* Project page: https://lsongx.github.io/projects/en2n.html

Via

Access Paper or Ask Questions