Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs

Jun 11, 2021
Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis

Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications such as social network recommendation, fraud detection, and graph search. The graphs in these applications are typically large, usually containing hundreds of millions of nodes. Training GNN models on such large graphs efficiently remains a big challenge. Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs, which require GPUs or mixed-CPU-GPU training. The state-of-the-art sampling-based methods are usually not optimized for these real-world hardware setups, in which data movement between CPUs and GPUs is a bottleneck. To address this issue, we propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for mixed-CPU-GPU training. The algorithm samples a global cache of nodes periodically for all mini-batches and stores them in GPUs. This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy. We provide a highly efficient implementation of this method and show that our implementation outperforms an efficient node-wise neighbor sampling baseline by a factor of 2X-4X on giant graphs. It outperforms an efficient implementation of LADIES with small layers by a factor of 2X-14X while achieving much higher accuracy than LADIES.We also theoretically analyze the proposed algorithm and show that with cached node data of a proper size, it enjoys a comparable convergence rate as the underlying node-wise sampling method.

* The paper is published in KDD 2021 

  Access Paper or Ask Questions

A Hybrid 2-stage Neural Optimization for Pareto Front Extraction

Feb 13, 2021
Gurpreet Singh, Soumyajit Gupta, Matthew Lease, Clint Dawson

Classification, recommendation, and ranking problems often involve competing goals with additional constraints (e.g., to satisfy fairness or diversity criteria). Such optimization problems are quite challenging, often involving non-convex functions along with considerations of user preferences in balancing trade-offs. Pareto solutions represent optimal frontiers for jointly optimizing multiple competing objectives. A major obstacle for frequently used linear-scalarization strategies is that the resulting optimization problem might not always converge to a global optimum. Furthermore, such methods only return one solution point per run. A Pareto solution set is a subset of all such global optima over multiple runs for different trade-off choices. Therefore, a Pareto front can only be guaranteed with multiple runs of the linear-scalarization problem, where all runs converge to their respective global optima. Consequently, extracting a Pareto front for practical problems is computationally intractable with substantial computational overheads, limited scalability, and reduced accuracy. We propose a robust, low cost, two-stage, hybrid neural Pareto optimization approach that is accurate and scales (compute space and time) with data dimensions, as well as number of functions and constraints. The first stage (neural network) efficiently extracts a weak Pareto front, using Fritz-John conditions as the discriminator, with no assumptions of convexity on the objectives or constraints. The second stage (efficient Pareto filter) extracts the strong Pareto optimal subset given the weak front from stage 1. Fritz-John conditions provide us with theoretical bounds on approximation error between the true and network extracted weak Pareto front. Numerical experiments demonstrates the accuracy and efficiency on a canonical set of benchmark problems and a fairness optimization task from prior works.

  Access Paper or Ask Questions

Regularized Submodular Maximization at Scale

Feb 10, 2020
Ehsan Kazemi, Shervin Minaee, Moran Feldman, Amin Karbasi

In this paper, we propose scalable methods for maximizing a regularized submodular function $f = g - \ell$ expressed as the difference between a monotone submodular function $g$ and a modular function $\ell$. Indeed, submodularity is inherently related to the notions of diversity, coverage, and representativeness. In particular, finding the mode of many popular probabilistic models of diversity, such as determinantal point processes, submodular probabilistic models, and strongly log-concave distributions, involves maximization of (regularized) submodular functions. Since a regularized function $f$ can potentially take on negative values, the classic theory of submodular maximization, which heavily relies on the non-negativity assumption of submodular functions, may not be applicable. To circumvent this challenge, we develop the first one-pass streaming algorithm for maximizing a regularized submodular function subject to a $k$-cardinality constraint. It returns a solution $S$ with the guarantee that $f(S)\geq(\phi^{-2}-\epsilon) \cdot g(OPT)-\ell (OPT)$, where $\phi$ is the golden ratio. Furthermore, we develop the first distributed algorithm that returns a solution $S$ with the guarantee that $\mathbb{E}[f(S)] \geq (1-\epsilon) [(1-e^{-1}) \cdot g(OPT)-\ell(OPT)]$ in $O(1/ \epsilon)$ rounds of MapReduce computation, without keeping multiple copies of the entire dataset in each round (as it is usually done). We should highlight that our result, even for the unregularized case where the modular term $\ell$ is zero, improves the memory and communication complexity of the existing work by a factor of $O(1/ \epsilon)$ while arguably provides a simpler distributed algorithm and a unifying analysis. We also empirically study the performance of our scalable methods on a set of real-life applications, including finding the mode of distributions, data summarization, and product recommendation.

  Access Paper or Ask Questions

Machine learning methods for multimedia information retrieval

May 14, 2017
Bálint Zoltán Daróczy

In this thesis we examined several multimodal feature extraction and learning methods for retrieval and classification purposes. We reread briefly some theoretical results of learning in Section 2 and reviewed several generative and discriminative models in Section 3 while we described the similarity kernel in Section 4. We examined different aspects of the multimodal image retrieval and classification in Section 5 and suggested methods for identifying quality assessments of Web documents in Section 6. In our last problem we proposed similarity kernel for time-series based classification. The experiments were carried over publicly available datasets and source codes for the most essential parts are either open source or released. Since the used similarity graphs (Section 4.2) are greatly constrained for computational purposes, we would like to continue work with more complex, evolving and capable graphs and apply for different problems such as capturing the rapid change in the distribution (e.g. session based recommendation) or complex graphs of the literature work. The similarity kernel with the proper metrics reaches and in many cases improves over the state-of-the-art. Hence we may conclude generative models based on instance similarities with multiple modes is a generally applicable model for classification and regression tasks ranging over various domains, including but not limited to the ones presented in this thesis. More generally, the Fisher kernel is not only unique in many ways but one of the most powerful kernel functions. Therefore we may exploit the Fisher kernel in the future over widely used generative models, such as Boltzmann Machines [Hinton et al., 1984], a particular subset, the Restricted Boltzmann Machines and Deep Belief Networks [Hinton et al., 2006]), Latent Dirichlet Allocation [Blei et al., 2003] or Hidden Markov Models [Baum and Petrie, 1966] to name a few.

* doctoral thesis, 2016 

  Access Paper or Ask Questions

Shuffle Private Linear Contextual Bandits

Feb 11, 2022
Sayak Ray Chowdhury, Xingyu Zhou

Differential privacy (DP) has been recently introduced to linear contextual bandits to formally address the privacy concerns in its associated personalized services to participating users (e.g., recommendations). Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side. However, there remains a fundamental gap in the utility achieved by learning algorithms under these two privacy models, e.g., $\tilde{O}(\sqrt{T})$ regret in the central model as compared to $\tilde{O}(T^{3/4})$ regret in the local model, if all users are unique within a learning horizon $T$. In this work, we aim to achieve a stronger model of trust than the central model, while suffering a smaller regret than the local model by considering recently popular shuffle model of privacy. We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler in between users and the central server, that randomly permutes a batch of users data before sending those to the server. We then instantiate this framework with two specific shuffle protocols: one relying on privacy amplification of local mechanisms, and another incorporating a protocol for summing vectors and matrices of bounded norms. We prove that both these instantiations lead to regret guarantees that significantly improve on that of the local model, and can potentially be of the order $\tilde{O}(T^{3/5})$ if all users are unique. We also verify this regret behavior with simulations on synthetic data. Finally, under the practical scenario of non-unique users, we show that the regret of our shuffle private algorithm scale as $\tilde{O}(T^{2/3})$, which matches that the central model could achieve in this case.

  Access Paper or Ask Questions

Response by the Montreal AI Ethics Institute to the European Commission's Whitepaper on AI

Jun 16, 2020
Abhishek Gupta, Camylle Lanteigne

In February 2020, the European Commission (EC) published a white paper entitled, On Artificial Intelligence - A European approach to excellence and trust. This paper outlines the EC's policy options for the promotion and adoption of artificial intelligence (AI) in the European Union. The Montreal AI Ethics Institute (MAIEI) reviewed this paper and published a response addressing the EC's plans to build an "ecosystem of excellence" and an "ecosystem of trust," as well as the safety and liability implications of AI, the internet of things (IoT), and robotics. MAIEI provides 15 recommendations in relation to the sections outlined above, including: 1) focus efforts on the research and innovation community, member states, and the private sector; 2) create alignment between trading partners' policies and EU policies; 3) analyze the gaps in the ecosystem between theoretical frameworks and approaches to building trustworthy AI; 4) focus on coordination and policy alignment; 5) focus on mechanisms that promote private and secure sharing of data; 6) create a network of AI research excellence centres to strengthen the research and innovation community; 7) promote knowledge transfer and develop AI expertise through Digital Innovation Hubs; 8) add nuance to the discussion regarding the opacity of AI systems; 9) create a process for individuals to appeal an AI system's decision or output; 10) implement new rules and strengthen existing regulations; 11) ban the use of facial recognition technology; 12) hold all AI systems to similar standards and compulsory requirements; 13) ensure biometric identification systems fulfill the purpose for which they are implemented; 14) implement a voluntary labelling system for systems that are not considered high-risk; 15) appoint individuals to the oversight process who understand AI systems well and are able to communicate potential risks.

* Submitted to the European Commission 

  Access Paper or Ask Questions

FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network

Aug 10, 2021
Qiang Hou, Weiqing Min, Jing Wang, Sujuan Hou, Yuanjie Zheng, Shuqiang Jiang

Food logo detection plays an important role in the multimedia for its wide real-world applications, such as food recommendation of the self-service shop and infringement detection on e-commerce platforms. A large-scale food logo dataset is urgently needed for developing advanced food logo detection algorithms. However, there are no available food logo datasets with food brand information. To support efforts towards food logo detection, we introduce the dataset FoodLogoDet-1500, a new large-scale publicly available food logo dataset, which has 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects. We describe the collection and annotation process of FoodLogoDet-1500, analyze its scale and diversity, and compare it with other logo datasets. To the best of our knowledge, FoodLogoDet-1500 is the first largest publicly available high-quality dataset for food logo detection. The challenge of food logo detection lies in the large-scale categories and similarities between food logo categories. For that, we propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet), which decouples classification and regression into two branches and focuses on the classification branch to solve the problem of distinguishing multiple food logo categories. Specifically, we introduce the feature offset module, which utilizes the deformation-learning for optimal classification offset and can effectively obtain the most representative features of classification in detection. In addition, we adopt a balanced feature pyramid in MFDNet, which pays attention to global information, balances the multi-scale feature maps, and enhances feature extraction capability. Comprehensive experiments on FoodLogoDet-1500 and other two benchmark logo datasets demonstrate the effectiveness of the proposed method. The FoodLogoDet-1500 can be found at this https URL.

* This paper has been accepted to ACM MM 2021. The FoodLogoDet-1500, see 

  Access Paper or Ask Questions

An Interpretable Approach to Automated Severity Scoring in Pelvic Trauma

May 21, 2021
Anna Zapaishchykova, David Dreizin, Zhaoshuo Li, Jie Ying Wu, Shahrooz Faghih Roohi, Mathias Unberath

Pelvic ring disruptions result from blunt injury mechanisms and are often found in patients with multi-system trauma. To grade pelvic fracture severity in trauma victims based on whole-body CT, the Tile AO/OTA classification is frequently used. Due to the high volume of whole-body trauma CTs generated in busy trauma centers, an automated approach to Tile classification would provide substantial value, e.,g., to prioritize the reading queue of the attending trauma radiologist. In such scenario, an automated method should perform grading based on a transparent process and based on interpretable features to enable interaction with human readers and lower their workload by offering insights from a first automated read of the scan. This paper introduces an automated yet interpretable pelvic trauma decision support system to assist radiologists in fracture detection and Tile grade classification. The method operates similarly to human interpretation of CT scans and first detects distinct pelvic fractures on CT with high specificity using a Faster-RCNN model that are then interpreted using a structural causal model based on clinical best practices to infer an initial Tile grade. The Bayesian causal model and finally, the object detector are then queried for likely co-occurring fractures that may have been rejected initially due to the highly specific operating point of the detector, resulting in an updated list of detected fractures and corresponding final Tile grade. Our method is transparent in that it provides finding location and type using the object detector, as well as information on important counterfactuals that would invalidate the system's recommendation and achieves an AUC of 83.3%/85.1% for translational/rotational instability. Despite being designed for human-machine teaming, our approach does not compromise on performance compared to previous black-box approaches.

* 10 pages, 3 figures 

  Access Paper or Ask Questions

LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis

Aug 20, 2021
Fan Wu, Yunhui Long, Ce Zhang, Bo Li

Graph structured data have enabled several successful applications such as recommendation systems and traffic prediction, given the rich node features and edges information. However, these high-dimensional features and high-order adjacency information are usually heterogeneous and held by different data holders in practice. Given such vertical data partition (e.g., one data holder will only own either the node features or edge information), different data holders have to develop efficient joint training protocols rather than directly transfer data to each other due to privacy concerns. In this paper, we focus on the edge privacy, and consider a training scenario where Bob with node features will first send training node features to Alice who owns the adjacency information. Alice will then train a graph neural network (GNN) with the joint information and release an inference API. During inference, Bob is able to provide test node features and query the API to obtain the predictions for test nodes. Under this setting, we first propose a privacy attack LinkTeller via influence analysis to infer the private edge information held by Alice via designing adversarial queries for Bob. We then empirically show that LinkTeller is able to recover a significant amount of private edges, outperforming existing baselines. To further evaluate the privacy leakage, we adapt an existing algorithm for differentially private graph convolutional network (DP GCN) training and propose a new DP GCN mechanism LapGraph. We show that these DP GCN mechanisms are not always resilient against LinkTeller empirically under mild privacy guarantees ($\varepsilon>5$). Our studies will shed light on future research towards designing more resilient privacy-preserving GCN models; in the meantime, provide an in-depth understanding of the tradeoff between GCN model utility and robustness against potential privacy attacks.

  Access Paper or Ask Questions