Abstract:In response to Distributed Denial of Service (DDoS) attacks, recent research efforts increasingly rely on Machine Learning (ML)-based solutions, whose effectiveness largely depends on the quality of labeled training datasets. To address the scarcity of such datasets, data augmentation with synthetic traces is often employed. However, current synthetic trace generation methods struggle to capture the complex temporal patterns and spatial distributions exhibited in emerging DDoS attacks. This results in insufficient resemblance to real traces and unsatisfied detection accuracy when applied to ML tasks. In this paper, we propose Dual-Stream Temporal-Field Diffusion (DSTF-Diffusion), a multi-view, multi-stream network traffic generative model based on diffusion models, featuring two main streams: The field stream utilizes spatial mapping to bridge network data characteristics with pre-trained realms of stable diffusion models, effectively translating complex network interactions into formats that stable diffusion can process, while the spatial stream adopts a dynamic temporal modeling approach, meticulously capturing the intrinsic temporal patterns of network traffic. Extensive experiments demonstrate that data generated by our model exhibits higher statistical similarity to originals compared to current state-of-the-art solutions, and enhance performances on a wide range of downstream tasks.
Abstract:Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for network stream mining. Recent studies attempt to optimize sketches using machine learning; however, these approaches face the challenges of lacking adaptivity to dynamic networks and incurring high training costs. In this paper, we propose LLM-Sketch, based on the insight that fields beyond the flow IDs in packet headers can also help infer flow sizes. By using a two-tier data structure and separately recording large and small flows, LLM-Sketch improves accuracy while minimizing memory usage. Furthermore, it leverages fine-tuned large language models (LLMs) to reliably estimate flow sizes. We evaluate LLM-Sketch on three representative tasks, and the results demonstrate that LLM-Sketch outperforms state-of-the-art methods by achieving a $7.5\times$ accuracy improvement.
Abstract:It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.