Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

Feb 23, 2024
Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain. However, when deploying a vision-language system with low tolerance for inaccurate predictions, selective prediction may be over-cautious and abstain too frequently, even on many correct predictions. We introduce ReCoVERR, an inference-time algorithm to reduce the over-abstention of a selective vision-language system without decreasing prediction accuracy. When the VLM makes a low-confidence prediction, instead of abstaining ReCoVERR tries to find relevant clues in the image that provide additional evidence for the prediction. ReCoVERR uses an LLM to pose related questions to the VLM, collects high-confidence evidences, and if enough evidence confirms the prediction the system makes a prediction instead of abstaining. ReCoVERR enables two VLMs, BLIP2 and InstructBLIP, to answer up to 20% more questions on the A-OKVQA task than vanilla selective prediction without decreasing system accuracy, thus improving overall system reliability.

Via

Access Paper or Ask Questions

TILP: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs

Feb 19, 2024
Siheng Xiong, Yuan Yang, Faramarz Fekri, James Clayton Kerce

Compared with static knowledge graphs, temporal knowledge graphs (tKG), which can capture the evolution and change of information over time, are more realistic and general. However, due to the complexity that the notion of time introduces to the learning of the rules, an accurate graph reasoning, e.g., predicting new links between entities, is still a difficult problem. In this paper, we propose TILP, a differentiable framework for temporal logical rules learning. By designing a constrained random walk mechanism and the introduction of temporal operators, we ensure the efficiency of our model. We present temporal features modeling in tKG, e.g., recurrence, temporal order, interval between pair of relations, and duration, and incorporate it into our learning process. We compare TILP with state-of-the-art methods on two benchmark datasets. We show that our proposed framework can improve upon the performance of baseline methods while providing interpretable results. In particular, we consider various scenarios in which training samples are limited, data is biased, and the time range between training and inference are different. In all these cases, TILP works much better than the state-of-the-art methods.

* ICLR 2023 poster

Via

Access Paper or Ask Questions

The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting

Feb 13, 2024
David Haag, Devender Kumar, Sebastian Gruber, Mahdi Sareban, Gunnar Treff, Josef Niebauer, Christopher Bull, Jan David Smeddinck

We explored the viability of Large Language Models (LLMs) for triggering and personalizing content for Just-in-Time Adaptive Interventions (JITAIs) in digital health. JITAIs are being explored as a key mechanism for sustainable behavior change, adapting interventions to an individual's current context and needs. However, traditional rule-based and machine learning models for JITAI implementation face scalability and reliability limitations, such as lack of personalization, difficulty in managing multi-parametric systems, and issues with data sparsity. To investigate JITAI implementation via LLMs, we tested the contemporary overall performance-leading model 'GPT-4' with examples grounded in the use case of fostering heart-healthy physical activity in outpatient cardiac rehabilitation. Three personas and five sets of context information per persona were used as a basis of triggering and personalizing JITAIs. Subsequently, we generated a total of 450 proposed JITAI decisions and message content, divided equally into JITAIs generated by 10 iterations with GPT-4, a baseline provided by 10 laypersons (LayPs), and a gold standard set by 10 healthcare professionals (HCPs). Ratings from 27 LayPs indicated that JITAIs generated by GPT-4 were superior to those by HCPs and LayPs over all assessed scales: i.e., appropriateness, engagement, effectiveness, and professionality. This study indicates that LLMs have significant potential for implementing JITAIs as a building block of personalized or "precision" health, offering scalability, effective personalization based on opportunistically sampled information, and good acceptability.

Via

Access Paper or Ask Questions

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

Feb 20, 2024
Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan

This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.

* 3D generation, project page: https://mvdiffusion-plusplus.github.io/

Via

Access Paper or Ask Questions

Sample-Efficient Linear Regression with Self-Selection Bias

Feb 22, 2024
Jason Gaitonde, Elchanan Mossel

We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}_{\ell},z_{\ell})_{\ell=1}^m$ where $z_{\ell}=\max_{i\in [k]}\{\mathbf{x}_{\ell}^T\mathbf{w}_i+\eta_{i,\ell}\}$, but the maximizing index $i_{\ell}$ is unobserved. Here, the $\mathbf{x}_{\ell}$ are assumed to be $\mathcal{N}(0,I_n)$ and the noise distribution $\mathbf{\eta}_{\ell}\sim \mathcal{D}$ is centered and independent of $\mathbf{x}_{\ell}$. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $\mathbf{w}_1,\ldots,\mathbf{w}_k\in \mathbb{R}^n$ up to additive $\ell_2$-error $\varepsilon$ with polynomial sample complexity $\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon)$ and significantly improved time complexity $\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon)^{O(k)}$. When $k=O(1)$, our algorithm runs in $\mathsf{poly}(n,1/\varepsilon)$ time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for $k=2$ and when it is known that $\mathcal{D}=\mathcal{N}(0,I_k)$. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of $k$ than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small $\varepsilon$, and leads to improved algorithms for any $\varepsilon$ by providing a warm start for existing local convergence methods.

* 40 pages

Via

Access Paper or Ask Questions

VN Network: Embedding Newly Emerging Entities with Virtual Neighbors

Feb 21, 2024
Yongquan He, Zihan Wang, Peng Zhang, Zhaopeng Tu, Zhaochun Ren

Embedding entities and relations into continuous vector spaces has attracted a surge of interest in recent years. Most embedding methods assume that all test entities are available during training, which makes it time-consuming to retrain embeddings for newly emerging entities. To address this issue, recent works apply the graph neural network on the existing neighbors of the unseen entities. In this paper, we propose a novel framework, namely Virtual Neighbor (VN) network, to address three key challenges. Firstly, to reduce the neighbor sparsity problem, we introduce the concept of the virtual neighbors inferred by rules. And we assign soft labels to these neighbors by solving a rule-constrained problem, rather than simply regarding them as unquestionably true. Secondly, many existing methods only use one-hop or two-hop neighbors for aggregation and ignore the distant information that may be helpful. Instead, we identify both logic and symmetric path rules to capture complex patterns. Finally, instead of one-time injection of rules, we employ an iterative learning scheme between the embedding method and virtual neighbor prediction to capture the interactions within. Experimental results on two knowledge graph completion tasks demonstrate that our VN network significantly outperforms state-of-the-art baselines. Furthermore, results on Subject/Object-R show that our proposed VN network is highly robust to the neighbor sparsity problem.

* CIKM (2020) 505-514
* 10 pages, 5 figures

Via

Access Paper or Ask Questions

SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via Self-Supervised Learning

Feb 19, 2024
Jongha Lee, Sunwoo Kim, Kijung Shin

To detect anomalies in real-world graphs, such as social, email, and financial networks, various approaches have been developed. While they typically assume static input graphs, most real-world graphs grow over time, naturally represented as edge streams. In this context, we aim to achieve three goals: (a) instantly detecting anomalies as they occur, (b) adapting to dynamically changing states, and (c) handling the scarcity of dynamic anomaly labels. In this paper, we propose SLADE (Self-supervised Learning for Anomaly Detection in Edge Streams) for rapid detection of dynamic anomalies in edge streams, without relying on labels. SLADE detects the shifts of nodes into abnormal states by observing deviations in their interaction patterns over time. To this end, it trains a deep neural network to perform two self-supervised tasks: (a) minimizing drift in node representations and (b) generating long-term interaction patterns from short-term ones. Failure in these tasks for a node signals its deviation from the norm. Notably, the neural network and tasks are carefully designed so that all required operations can be performed in constant time (w.r.t. the graph size) in response to each new edge in the input stream. In dynamic anomaly detection across four real-world datasets, SLADE outperforms nine competing methods, even those leveraging label supervision.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI

Feb 05, 2024
Ammu R., Debanjali Bhattacharya, Ameiy Acharya, Ninad Aithal, Neelam Sinha

In this study, we present a technique that spans multi-scale views (global scale -- meaning brain network-level and local scale -- examining each individual ROI that constitutes the network) applied to resting-state fMRI volumes. Deep learning based classification is utilized in understanding neurodegeneration. The novelty of the proposed approach lies in utilizing two extreme scales of analysis. One branch considers the entire network within graph-analysis framework. Concurrently, the second branch scrutinizes each ROI within a network independently, focusing on evolution of dynamics. For each subject, graph-based approach employs partial correlation to profile the subject in a single graph where each ROI is a node, providing insights into differences in levels of participation. In contrast, non-linear analysis employs recurrence plots to profile a subject as a multichannel 2D image, revealing distinctions in underlying dynamics. The proposed approach is employed for classification of a cohort of 50 healthy control (HC) and 50 Mild Cognitive Impairment (MCI), sourced from ADNI dataset. Results point to: (1) reduced activity in ROIs such as PCC in MCI (2) greater activity in occipital in MCI, which is not seen in HC (3) when analysed for dynamics, all ROIs in MCI show greater predictability in time-series.

* 12 pages, 3 figures and 4 tables

Via

Access Paper or Ask Questions

A hybrid IndRNNLSTM approach for real-time anomaly detection in software-defined networks

Feb 02, 2024
Sajjad Salem, Salman Asoudeh

Anomaly detection in SDN using data flow prediction is a difficult task. This problem is included in the category of time series and regression problems. Machine learning approaches are challenging in this field due to the manual selection of features. On the other hand, deep learning approaches have important features due to the automatic selection of features. Meanwhile, RNN-based approaches have been used the most. The LSTM and GRU approaches learn dependent entities well; on the other hand, the IndRNN approach learns non-dependent entities in time series. The proposed approach tried to use a combination of IndRNN and LSTM approaches to learn dependent and non-dependent features. Feature selection approaches also provide a suitable view of features for the models; for this purpose, four feature selection models, Filter, Wrapper, Embedded, and Autoencoder were used. The proposed IndRNNLSTM algorithm, in combination with Embedded, was able to achieve MAE=1.22 and RMSE=9.92 on NSL-KDD data.

Via

Access Paper or Ask Questions

UFO: A UI-Focused Agent for Windows OS Interaction

Feb 23, 2024
Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users' daily usage. The results, derived from both quantitative metrics and real-case studies, underscore the superior effectiveness of UFO in fulfilling user requests. To the best of our knowledge, UFO stands as the first UI agent specifically tailored for task completion within the Windows OS environment. The open-source code for UFO is available on https://github.com/microsoft/UFO.

Via

Access Paper or Ask Questions