Alert button
Picture for Youngser Park

Youngser Park

Alert button

Semisupervised regression in latent structure networks on unknown manifolds

May 04, 2023
Aranyak Acharyya, Joshua Agterberg, Michael W. Trosset, Youngser Park, Carey E. Priebe

Figure 1 for Semisupervised regression in latent structure networks on unknown manifolds
Figure 2 for Semisupervised regression in latent structure networks on unknown manifolds
Figure 3 for Semisupervised regression in latent structure networks on unknown manifolds
Figure 4 for Semisupervised regression in latent structure networks on unknown manifolds

Random graphs are increasingly becoming objects of interest for modeling networks in a wide range of applications. Latent position random graph models posit that each node is associated with a latent position vector, and that these vectors follow some geometric structure in the latent space. In this paper, we consider random dot product graphs, in which an edge is formed between two nodes with probability given by the inner product of their respective latent positions. We assume that the latent position vectors lie on an unknown one-dimensional curve and are coupled with a response covariate via a regression model. Using the geometry of the underlying latent position vectors, we propose a manifold learning and graph embedding technique to predict the response variable on out-of-sample nodes, and we establish convergence guarantees for these responses. Our theoretical results are supported by simulations and an application to Drosophila brain data.

Viaarxiv icon

Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics

May 03, 2023
Cencheng Shen, Jonathan Larson, Ha Trinh, Xihan Qin, Youngser Park, Carey E. Priebe

Figure 1 for Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics
Figure 2 for Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics
Figure 3 for Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics
Figure 4 for Discovering Communication Pattern Shifts in Large-Scale Networks using Encoder Embedding and Vertex Dynamics

The analysis of large-scale time-series network data, such as social media and email communications, remains a significant challenge for graph analysis methodology. In particular, the scalability of graph analysis is a critical issue hindering further progress in large-scale downstream inference. In this paper, we introduce a novel approach called "temporal encoder embedding" that can efficiently embed large amounts of graph data with linear complexity. We apply this method to an anonymized time-series communication network from a large organization spanning 2019-2020, consisting of over 100 thousand vertices and 80 million edges. Our method embeds the data within 10 seconds on a standard computer and enables the detection of communication pattern shifts for individual vertices, vertex communities, and the overall graph structure. Through supporting theory and synthesis studies, we demonstrate the theoretical soundness of our approach under random graph models and its numerical effectiveness through simulation studies.

* 25 pages main + 7 pages appendix 
Viaarxiv icon

Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection

Jan 18, 2023
Cencheng Shen, Youngser Park, Carey E. Priebe

Figure 1 for Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection
Figure 2 for Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection
Figure 3 for Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection
Figure 4 for Graph Encoder Ensemble for Simultaneous Vertex Embedding and Community Detection

In this paper we propose a novel and computationally efficient method to simultaneously achieve vertex embedding, community detection, and community size determination. By utilizing a normalized one-hot graph encoder and a new rank-based cluster size measure, the proposed graph encoder ensemble algorithm achieves excellent numerical performance throughout a variety of simulations and real data experiments.

* 6 pages, 4 figures, 3 tables 
Viaarxiv icon

Dynamic Network Sampling for Community Detection

Aug 29, 2022
Cong Mu, Youngser Park, Carey E. Priebe

Figure 1 for Dynamic Network Sampling for Community Detection
Figure 2 for Dynamic Network Sampling for Community Detection
Figure 3 for Dynamic Network Sampling for Community Detection
Figure 4 for Dynamic Network Sampling for Community Detection

We propose a dynamic network sampling scheme to optimize block recovery for stochastic blockmodel (SBM) in the case where it is prohibitively expensive to observe the entire graph. Theoretically, we provide justification of our proposed Chernoff-optimal dynamic sampling scheme via the Chernoff information. Practically, we evaluate the performance, in terms of block recovery, of our method on several real datasets from different domains. Both theoretically and practically results suggest that our method can identify vertices that have the most impact on block structure so that one can only check whether there are edges between them to save significant resources but still recover the block structure.

* 17 pages, 7 figures 
Viaarxiv icon

An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

Sep 26, 2021
Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn

Figure 1 for An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces
Figure 2 for An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces
Figure 3 for An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces
Figure 4 for An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exploits new techniques from the graph matching optimization literature. These contrasting approaches have not been compared in BLI so far. In this work, we study the behavior of Euclidean versus graph-based approaches to BLI under differing data conditions and show that they complement each other when combined. We release our code at https://github.com/kellymarchisio/euc-v-graph-bli.

* EMNLP Findings 2021 Camera-Ready 
Viaarxiv icon

Leveraging semantically similar queries for ranking via combining representations

Jun 23, 2021
Hayden S. Helm, Marah Abdin, Benjamin D. Pedigo, Shweti Mahajan, Vince Lyzinski, Youngser Park, Amitabh Basu, Piali~Choudhury, Christopher M. White, Weiwei Yang, Carey E. Priebe

Figure 1 for Leveraging semantically similar queries for ranking via combining representations
Figure 2 for Leveraging semantically similar queries for ranking via combining representations
Figure 3 for Leveraging semantically similar queries for ranking via combining representations

In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.

Viaarxiv icon

Dynamic Silos: Modularity in intra-organizational communication networks during the Covid-19 pandemic

May 01, 2021
Jonathan Larson, Tiona Zuzul, Emily Cox Pahnke, Neha Parikh Shah, Patrick Bourke, Nicholas Caurvina, Fereshteh Amini, Youngser Park, Joshua Vogelstein, Jeffrey Weston, Christopher White, Carey E. Priebe

Figure 1 for Dynamic Silos: Modularity in intra-organizational communication networks during the Covid-19 pandemic
Figure 2 for Dynamic Silos: Modularity in intra-organizational communication networks during the Covid-19 pandemic
Figure 3 for Dynamic Silos: Modularity in intra-organizational communication networks during the Covid-19 pandemic
Figure 4 for Dynamic Silos: Modularity in intra-organizational communication networks during the Covid-19 pandemic

Workplace communications around the world were drastically altered by Covid-19, work-from-home orders, and the rise of remote work. We analyze aggregated, anonymized metadata from over 360 billion emails within over 4000 organizations worldwide to examine changes in network community structures from 2019 through 2020. We find that, during 2020, organizations around the world became more siloed, evidenced by increased modularity. This shift was concurrent with decreased stability, indicating that organizational siloes had less stable membership. We provide initial insights into the implications of these network changes -- which we term dynamic silos -- for organizational performance and innovation.

* 19 pages, 15 figures 
Viaarxiv icon

Dynamic Silos: Modularity in intra-organizational communication networks before and during the Covid-19 pandemic

Apr 07, 2021
Jonathan Larson, Tiona Zuzul, Emily Cox Pahnke, Neha Parikh Shah, Patrick Bourke, Nicholas Caurvina, Fereshteh Amini, Youngser Park, Joshua Vogelstein, Jeffrey Weston, Christopher White, Carey E. Priebe

Figure 1 for Dynamic Silos: Modularity in intra-organizational communication networks before and during the Covid-19 pandemic
Figure 2 for Dynamic Silos: Modularity in intra-organizational communication networks before and during the Covid-19 pandemic
Figure 3 for Dynamic Silos: Modularity in intra-organizational communication networks before and during the Covid-19 pandemic
Figure 4 for Dynamic Silos: Modularity in intra-organizational communication networks before and during the Covid-19 pandemic

Workplace communications around the world were drastically altered by Covid-19, work-from-home orders, and the rise of remote work. We analyze aggregated, anonymized metadata from over 360 billion emails within over 4000 organizations worldwide to examine changes in network community structures from 2019 through 2020. We find that, during 2020, organizations around the world became more siloed, evidenced by increased modularity. This shift was concurrent with decreased stability, indicating that organizational siloes had less stable membership. We provide initial insights into the implications of these network changes -- which we term dynamic silos -- for organizational performance and innovation.

* 17 pages, 14 figues 
Viaarxiv icon

Learning to rank via combining representations

May 20, 2020
Hayden S. Helm, Amitabh Basu, Avanti Athreya, Youngser Park, Joshua T. Vogelstein, Michael Winding, Marta Zlatic, Albert Cardona, Patrick Bourke, Jonathan Larson, Chris White, Carey E. Priebe

Figure 1 for Learning to rank via combining representations
Figure 2 for Learning to rank via combining representations
Figure 3 for Learning to rank via combining representations
Figure 4 for Learning to rank via combining representations

Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate -- analytically, in simulation, and in real data examples -- that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is of general interest, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.

* 10 pages, 4 figures 
Viaarxiv icon