Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Recommendation": models, code, and papers

MISIM: An End-to-End Neural Code Similarity System

Jun 15, 2020
Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

Code similarity systems are integral to a range of applications from code recommendation to automated construction of software tests and defect mitigation. In this paper, we present Machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware similarity structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring system, which can be implemented with various neural network algorithms and topologies with learned parameters. We compare MISIM to three other state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 45,780 programs, MISIM consistently outperformed all three systems, often by a large factor (upwards of 40.6x).

* arXiv admin note: text overlap with arXiv:2003.11118 

  Access Paper or Ask Questions

Privacy in Deep Learning: A Survey

May 09, 2020
Fatemehsadat Mireshghallah, Mohammadkazem Taram, Praneeth Vepakomma, Abhishek Singh, Ramesh Raskar, Hadi Esmaeilzadeh

The ever-growing advances of deep learning in many areas including vision, recommendation systems, natural language processing, etc., have led to the adoption of Deep Neural Networks (DNNs) in production systems. The availability of large datasets and high computational power are the main contributors to these advances. The datasets are usually crowdsourced and may contain sensitive information. This poses serious privacy concerns as this data can be misused or leaked through various vulnerabilities. Even if the cloud provider and the communication link is trusted, there are still threats of inference attacks where an attacker could speculate properties of the data used for training, or find the underlying model architecture and parameters. In this survey, we review the privacy concerns brought by deep learning, and the mitigating techniques introduced to tackle these issues. We also show that there is a gap in the literature regarding test-time inference privacy, and propose possible future research directions.

  Access Paper or Ask Questions

Does Gender Matter? Towards Fairness in Dialogue Systems

Oct 16, 2019
Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu, Jiliang Tang

Recently there are increasing concerns about the fairness of Artificial Intelligence (AI) in real-world applications such as computer vision and recommendations. For example, recognition algorithms in computer vision are unfair to black people such as poorly detecting their faces and inappropriately identifying them as "gorillas". As one crucial application of AI, dialogue systems have been extensively applied in our society. They are usually built with real human conversational data; thus they could inherit some fairness issues which are held in the real world. However, the fairness of dialogue systems has not been investigated. In this paper, we perform the initial study about the fairness issues in dialogue systems. In particular, we construct the first dataset and propose quantitative measures to understand fairness in dialogue models. Our studies demonstrate that popular dialogue models show significant prejudice towards different genders and races. We will release the dataset and the measurement code later to foster the fairness research in dialogue systems.

* 12 pages 

  Access Paper or Ask Questions

Deep Learning for Automated Classification of Tuberculosis-Related Chest X-Ray: Dataset Specificity Limits Diagnostic Performance Generalizability

Nov 13, 2018
Seelwan Sathitratanacheewin, Krit Pongpirul

Machine learning has been an emerging tool for various aspects of infectious diseases including tuberculosis surveillance and detection. However, WHO provided no recommendations on using computer-aided tuberculosis detection software because of the small number of studies, methodological limitations, and limited generalizability of the findings. To quantify the generalizability of the machine-learning model, we developed a Deep Convolutional Neural Network (DCNN) model using a TB-specific CXR dataset of one population (National Library of Medicine Shenzhen No.3 Hospital) and tested it with non-TB-specific CXR dataset of another population (National Institute of Health Clinical Centers). The findings suggested that a supervised deep learning model developed by using the training dataset from one population may not have the same diagnostic performance in another population. Technical specification of CXR images, disease severity distribution, overfitting, and overdiagnosis should be examined before implementation in other settings.

* 6 pages, 1 figure 

  Access Paper or Ask Questions

Recovery Guarantees for Quadratic Tensors with Limited Observations

Oct 31, 2018
Hongyang Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang

We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models which are the sum of pairwise products instead of a triple product have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and this work examines their sample complexity and error guarantee. Our main result is that with the number of samples being only linear in the dimension, all local minima of the mean squared error objective are global minima and recover the original tensor accurately. The techniques lead to simple proofs showing that convex relaxation can recover quadratic tensors provided with linear number of samples. We substantiate our theoretical results with experiments on synthetic and real-world data, showing that quadratic models have better performance than CP models in scenarios where there are limited amount of observations available.

  Access Paper or Ask Questions

Co-adaptive learning over a countable space

Nov 30, 2016
Michael Rabadi

Co-adaptation is a special form of on-line learning where an algorithm $\mathcal{A}$ must assist an unknown algorithm $\mathcal{B}$ to perform some task. This is a general framework and has applications in recommendation systems, search, education, and much more. Today, the most common use of co-adaptive algorithms is in brain-computer interfacing (BCI), where algorithms help patients gain and maintain control over prosthetic devices. While previous studies have shown strong empirical results Kowalski et al. (2013); Orsborn et al. (2014) or have been studied in specific examples Merel et al. (2013, 2015), there is no general analysis of the co-adaptive learning problem. Here we will study the co-adaptive learning problem in the online, closed-loop setting. We will prove that, with high probability, co-adaptive learning is guaranteed to outperform learning with a fixed decoder as long as a particular condition is met.

* In NIPS 2016 Time Series Workshop. Barcelona, Spain 
* 6 pages, 1 figure, NIPS 2016 Time Series Workshop 

  Access Paper or Ask Questions

Contextual Semibandits via Supervised Learning Oracles

Nov 04, 2016
Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback. These problems, known as contextual semibandits, arise in crowdsourcing, recommendation, and many other domains. This paper reduces contextual semibandits to supervised learning, allowing us to leverage powerful supervised learning methods in this partial-feedback setting. Our first reduction applies when the mapping from feedback to reward is known and leads to a computationally efficient algorithm with near-optimal regret. We show that this algorithm outperforms state-of-the-art approaches on real-world learning-to-rank datasets, demonstrating the advantage of oracle-based algorithms. Our second reduction applies to the previously unstudied setting when the linear mapping from feedback to reward is unknown. Our regret guarantees are superior to prior techniques that ignore the feedback.

  Access Paper or Ask Questions