Abstract:Anomaly-based Network Intrusion Detection Systems (NIDS) require correctly labelled, representative and diverse datasets for an accurate evaluation and development. However, several widely used datasets do not include labels which are fine-grained enough and, together with small sample sizes, can lead to overfitting issues that also remain undetected when using test data. Additionally, the cybersecurity sector is evolving fast, and new attack mechanisms require the continuous creation of up-to-date datasets. To address these limitations, we developed a modular traffic generator that can simulate a wide variety of benign and malicious traffic. It incorporates multiple protocols, variability through randomization techniques and can produce attacks along corresponding benign traffic, as it occurs in real-world scenarios. Using the traffic generator, we create a dataset capturing over 12 million samples with 82 flow-level features and 21 fine-grained labels. Additionally, we include several web attack types which are often underrepresented in other datasets.




Abstract:Due to the COVID 19 pandemic, smartphone-based proximity tracing systems became of utmost interest. Many of these systems use Bluetooth Low Energy (BLE) signals to estimate the distance between two persons. The quality of this method depends on many factors and, therefore, does not always deliver accurate results. In this paper, we present a multi-channel approach to improve proximity estimation, and a novel, publicly available dataset that contains matched IEEE 802.11 (2.4 GHz and 5 GHz) and BLE signal strength data, measured in four different environments. We have developed and evaluated a combined classification model based on BLE and IEEE 802.11 signals. Our approach significantly improves the distance estimation and consequently also the contact tracing accuracy. We are able to achieve good results with our approach in everyday public transport scenarios. However, in our implementation based on IEEE 802.11 probe requests, we also encountered privacy problems and limitations due to the consistency and interval at which such probes are sent. We discuss these limitations and sketch how our approach could be improved to make it suitable for real-world deployment.