Alert button
Picture for Yongyu Wang

Yongyu Wang

Alert button

Improving Collaborative Filtering Recommendation via Graph Learning

Nov 06, 2023
Yongyu Wang

Recommendation systems are designed to provide personalized predictions for items that are most appealing to individual customers. Among various types of recommendation algorithms, k-nearest neighbor based collaborative filtering algorithm attracts tremendous attention and are widely used in practice. However, the k-nearest neighbor scheme can only capture the local relationship among users and the uniform neighborhood size is also not suitable to represent the underlying data structure. In this paper, we leverage emerging graph signal processing (GSP) theory to construct sparse yet high quality graph to enhance the solution quality and efficiency of collaborative filtering algorithm. Experimental results show that our method outperforms k-NN based collaborative filtering algorithm by a large margin on the benchmark data set.

Viaarxiv icon

Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Jun 07, 2023
Yuxuan Song, Yongyu Wang

Figure 1 for Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point
Figure 2 for Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point
Figure 3 for Towards High-Performance Exploratory Data Analysis (EDA) Via Stable Equilibrium Point

Exploratory data analysis (EDA) is a vital procedure for data science projects. In this work, we introduce a stable equilibrium point (SEP) - based framework for improving the efficiency and solution quality of EDA. By exploiting the SEPs to be the representative points, our approach aims to generate high-quality clustering and data visualization for large-scale data sets. A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets. Compared with prior state-of-the-art clustering and data visualization methods, the proposed methods allow substantially improving computing efficiency and solution quality for large-scale data analysis tasks.

Viaarxiv icon

Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

Apr 21, 2023
Yuxuan Song, Yongyu Wang

Figure 1 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression
Figure 2 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression
Figure 3 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression
Figure 4 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

Support vector clustering is an important clustering method. However, it suffers from a scalability issue due to its computational expensive cluster assignment step. In this paper we accelertate the support vector clustering via spectrum-preserving data compression. Specifically, we first compress the original data set into a small amount of spectrally representative aggregated data points. Then, we perform standard support vector clustering on the compressed data set. Finally, we map the clustering results of the compressed data set back to discover the clusters in the original data set. Our extensive experimental results on real-world data set demonstrate dramatically speedups over standard support vector clustering without sacrificing clustering quality.

Viaarxiv icon

Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression?

Apr 19, 2023
Yuxuan Song, Yongyu Wang

Figure 1 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression?
Figure 2 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression?
Figure 3 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression?
Figure 4 for Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression?

Support vector clustering is an important clustering method. However, it suffers from a scalability issue due to its computational expensive cluster assignment step. In this paper we accelertate the support vector clustering via spectrum-preserving data compression. Specifically, we first compress the original data set into a small amount of spectrally representative aggregated data points. Then, we perform standard support vector clustering on the compressed data set. Finally, we map the clustering results of the compressed data set back to discover the clusters in the original data set. Our extensive experimental results on real-world data set demonstrate dramatically speedups over standard support vector clustering without sacrificing clustering quality.

Viaarxiv icon

Accelerate 3D Object Processing via Spectral Layout

Oct 28, 2021
Yongyu Wang

Figure 1 for Accelerate 3D Object Processing via Spectral Layout
Figure 2 for Accelerate 3D Object Processing via Spectral Layout
Figure 3 for Accelerate 3D Object Processing via Spectral Layout
Figure 4 for Accelerate 3D Object Processing via Spectral Layout

3D image processing is an important problem in computer vision and pattern recognition fields. Compared with 2D image processing, its computation difficulty and cost are much higher due to the extra dimension. To fundamentally address this problem, we propose to embed the essential information in a 3D object into 2D space via spectral layout. Specifically, we construct a 3D adjacency graph to capture spatial structure of the 3D voxel grid. Then we calculate the eigenvectors corresponding to the second and third smallest eigenvalues of its graph Laplacian and perform spectral layout to map each voxel into a pixel in 2D Cartesian coordinate plane. The proposed method can achieve high quality 2D representations for 3D objects, which enables to use 2D-based methods to process 3D objects. The experimental results demonstrate the effectiveness and efficiency of our method.

Viaarxiv icon

Improving Spectral Clustering Using Spectrum-Preserving Node Reduction

Oct 24, 2021
Yongyu Wang

Figure 1 for Improving Spectral Clustering Using Spectrum-Preserving Node Reduction
Figure 2 for Improving Spectral Clustering Using Spectrum-Preserving Node Reduction
Figure 3 for Improving Spectral Clustering Using Spectrum-Preserving Node Reduction
Figure 4 for Improving Spectral Clustering Using Spectrum-Preserving Node Reduction

Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.

Viaarxiv icon

GRASPEL: Graph Spectral Learning at Scale

Nov 23, 2019
Yongyu Wang, Zhiqiang Zhao, Zhuo Feng

Figure 1 for GRASPEL: Graph Spectral Learning at Scale
Figure 2 for GRASPEL: Graph Spectral Learning at Scale
Figure 3 for GRASPEL: Graph Spectral Learning at Scale
Figure 4 for GRASPEL: Graph Spectral Learning at Scale

Learning meaningful graphs from data plays important roles in many data mining and machine learning tasks, such as data representation and analysis, dimension reduction, data clustering, and visualization, etc. In this work, for the first time, we present a highly-scalable spectral approach (GRASPEL) for learning large graphs from data. By limiting the precision matrix to be a graph Laplacian, our approach aims to estimate ultra-sparse (tree-like) weighted undirected graphs and shows a clear connection with the prior graphical Lasso method. By interleaving the latest high-performance nearly-linear time spectral methods for graph sparsification, coarsening and embedding, ultra-sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as spectral clustering (SC), and t-Distributed Stochastic Neighbor Embedding (t-SNE). {For example, when comparing with graphs constructed using existing methods, GRASPEL achieved the best spectral clustering efficiency and accuracy.

Viaarxiv icon

GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding

Oct 06, 2019
Chenhui Deng, Zhiqiang Zhao, Yongyu Wang, Zhiru Zhang, Zhuo Feng

Figure 1 for GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding
Figure 2 for GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding
Figure 3 for GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding
Figure 4 for GraphZoom: A multi-level spectral approach for accurate and scalable graph embedding

Graph embedding techniques have been increasingly deployed in a multitude of different applications that involve learning on non-Euclidean data. However, existing graph embedding models either fail to incorporate node attribute information during training or suffer from node attribute noise, which compromises the accuracy. Moreover, very few of them scale to large graphs due to their high computational complexity and memory usage. In this paper we propose GraphZoom, a multi-level framework for improving both accuracy and scalability of unsupervised graph embedding algorithms. GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information. This fused graph is then repeatedly coarsened into a much smaller graph by merging nodes with high spectral similarities. GraphZoom allows any existing embedding methods to be applied to the coarsened graph, before it progressively refine the embeddings obtained at the coarsest level to increasingly finer graphs. We have evaluated our approach on a number of popular graph datasets for both transductive and inductive tasks. Our experiments show that GraphZoom increases the classification accuracy and significantly reduces the run time compared to state-of-the-art unsupervised embedding methods.

Viaarxiv icon

Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

Oct 11, 2018
Yongyu Wang, Zhuo Feng

Figure 1 for Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Figure 2 for Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Figure 3 for Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Figure 4 for Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification

The eigendeomposition of nearest-neighbor (NN) graph Laplacian matrices is the main computational bottleneck in spectral clustering. In this work, we introduce a highly-scalable, spectrum-preserving graph sparsification algorithm that enables to build ultra-sparse NN (u-NN) graphs with guaranteed preservation of the original graph spectrums, such as the first few eigenvectors of the original graph Laplacian. Our approach can immediately lead to scalable spectral clustering of large data networks without sacrificing solution quality. The proposed method starts from constructing low-stretch spanning trees (LSSTs) from the original graphs, which is followed by iteratively recovering small portions of "spectrally critical" off-tree edges to the LSSTs by leveraging a spectral off-tree embedding scheme. To determine the suitable amount of off-tree edges to be recovered to the LSSTs, an eigenvalue stability checking scheme is proposed, which enables to robustly preserve the first few Laplacian eigenvectors within the sparsified graph. Additionally, an incremental graph densification scheme is proposed for identifying extra edges that have been missing in the original NN graphs but can still play important roles in spectral clustering tasks. Our experimental results for a variety of well-known data sets show that the proposed method can dramatically reduce the complexity of NN graphs, leading to significant speedups in spectral clustering.

Viaarxiv icon