Abstract:The Android operating system has been the most popular for smartphones and tablets since 2012. This popularity has led to a rapid raise of Android malware in recent years. The sophistication of Android malware obfuscation and detection avoidance methods have significantly improved, making many traditional malware detection methods obsolete. In this paper, we propose DL-Droid, a deep learning system to detect malicious Android applications through dynamic analysis using stateful input generation. Experiments performed with over 30,000 applications (benign and malware) on real devices are presented. Furthermore, experiments were also conducted to compare the detection performance and code coverage of the stateful input generation method with the commonly used stateless approach using the deep learning system. Our study reveals that DL-Droid can achieve up to 97.8% detection rate (with dynamic features only) and 99.6% detection rate (with dynamic + static features) respectively which outperforms traditional machine learning techniques. Furthermore, the results highlight the significance of enhanced input generation for dynamic analysis as DL-Droid with the state-based input generation is shown to outperform the existing state-of-the-art approaches.
Abstract:As mobile devices have become indispensable in modern life, mobile security is becoming much more important. Traditional password or PIN-like point-of-entry security measures score low on usability and are vulnerable to brute force and other types of attacks. In order to improve mobile security, an adaptive neuro-fuzzy inference system(ANFIS)-based implicit authentication system is proposed in this paper to provide authentication in a continuous and transparent manner.To illustrate the applicability and capability of ANFIS in our implicit authentication system, experiments were conducted on behavioural data collected for up to 12 weeks from different Android users. The ability of the ANFIS-based system to detect an adversary is also tested with scenarios involving an attacker with varying levels of knowledge. The results demonstrate that ANFIS is a feasible and efficient approach for implicit authentication with an average of 95% user recognition rate. Moreover, the use of ANFIS-based system for implicit authentication significantly reduces manual tuning and configuration tasks due to its selflearning capability.
Abstract:The Android operating system has become the most popular operating system for smartphones and tablets leading to a rapid rise in malware. Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator. For this reason, countermeasures against antiemulation are becoming increasingly important in Android malware detection. Analysis and detection based on real devices can alleviate the problems of anti-emulation as well as improve the effectiveness of dynamic analysis. Hence, in this paper we present an investigation of machine learning based malware detection using dynamic analysis on real devices. A tool is implemented to automatically extract dynamic features from Android phones and through several experiments, a comparative analysis of emulator based vs. device based detection by means of several machine learning algorithms is undertaken. Our study shows that several features could be extracted more effectively from the on-device dynamic analysis compared to emulators. It was also found that approximately 24% more apps were successfully analysed on the phone. Furthermore, all of the studied machine learning based detection performed better when applied to features extracted from the on-device dynamic analysis.
Abstract:Android malware has been on the rise in recent years due to the increasing popularity of Android and the proliferation of third party application markets. Emerging Android malware families are increasingly adopting sophisticated detection avoidance techniques and this calls for more effective approaches for Android malware detection. Hence, in this paper we present and evaluate an n-gram opcode features based approach that utilizes machine learning to identify and categorize Android malware. This approach enables automated feature discovery without relying on prior expert or domain knowledge for pre-determined features. Furthermore, by using a data segmentation technique for feature selection, our analysis is able to scale up to 10-gram opcodes. Our experiments on a dataset of 2520 samples showed an f-measure of 98% using the n-gram opcode based approach. We also provide empirical findings that illustrate factors that have probable impact on the overall n-gram opcodes performance trends.
Abstract:Mobile malware has been growing in scale and complexity spurred by the unabated uptake of smartphones worldwide. Android is fast becoming the most popular mobile platform resulting in sharp increase in malware targeting the platform. Additionally, Android malware is evolving rapidly to evade detection by traditional signature-based scanning. Despite current detection measures in place, timely discovery of new malware is still a critical issue. This calls for novel approaches to mitigate the growing threat of zero-day Android malware. Hence, in this paper we develop and analyze proactive Machine Learning approaches based on Bayesian classification aimed at uncovering unknown Android malware via static analysis. The study, which is based on a large malware sample set of majority of the existing families, demonstrates detection capabilities with high accuracy. Empirical results and comparative analysis are presented offering useful insight towards development of effective static-analytic Bayesian classification based solutions for detecting unknown Android malware.
Abstract:In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.
Abstract:Mobile malware has been growing in scale and complexity as smartphone usage continues to rise. Android has surpassed other mobile platforms as the most popular whilst also witnessing a dramatic increase in malware targeting the platform. A worrying trend that is emerging is the increasing sophistication of Android malware to evade detection by traditional signature-based scanners. As such, Android app marketplaces remain at risk of hosting malicious apps that could evade detection before being downloaded by unsuspecting users. Hence, in this paper we present an effective approach to alleviate this problem based on Bayesian classification models obtained from static code analysis. The models are built from a collection of code and app characteristics that provide indicators of potential malicious activities. The models are evaluated with real malware samples in the wild and results of experiments are presented to demonstrate the effectiveness of the proposed approach.
Abstract:With over 50 billion downloads and more than 1.3 million apps in the Google official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 to 99 percent detection accuracy with very low false positive rates.
Abstract:Mobile malware has continued to grow at an alarming rate despite on-going efforts towards mitigating the problem. This has been particularly noticeable on Android due to its being an open platform that has subsequently overtaken other platforms in the share of the mobile smart devices market. Hence, incentivizing a new wave of emerging Android malware sophisticated enough to evade most common detection methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.
Abstract:Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.