Machine learning for network traffic classification under labeled data and training time constraints
Advisor
Camp, TracyStone, Kerri A.
Date issued
2023Keywords
computer networkdeep learning
machine learning
network traffic classification
semi-supervised learning
transfer learning
Metadata
Show full item recordAbstract
This thesis investigates using machine learning (including deep learning) for network traffic classification when constrained by too little labeled data or insufficient time to train models from scratch. Network traffic classification is essential in network security, network management, and application identification. Labeling network traffic data, however, is often time-consuming and expensive, which limits the amount of labeled data available for training machine learning models. This thesis investigates using a semi-supervised learning approach that leverages positively labeled and unlabeled data to improve classification performance when faced with a lack of labeled data. The method uses a combination of bootstrap aggregation and tree-based classifiers to classify unlabeled network traffic flows from the same class successfully. This same semi-supervised learning approach also successfully detects zero-day (i.e., never before seen) encrypted messaging applications for which no training data is available. Additionally, this thesis investigates using deep transfer learning from a state-of-the-art computer vision model for network traffic image classification. By representing network traffic flows as grayscale network traffic images, highly sophisticated image classification models can transfer to the task of network traffic classification. Using these advanced models as a source for training dramatically enhances the speed at which new models can train, addressing the constraint of having too little time for training. To investigate whether deep transfer learning is successful in network traffic image classification, this work used our network flow capture system (which creates a volume of unlabeled data) and commercial appliances (to turn the unlabeled dataset into a real-world labeled dataset). Experimental results in this thesis demonstrate that the semi-supervised learning technique of positive and unlabeled learning is highly effective at detecting hidden positives amongst unlabeled data. Furthermore, this thesis shows that representing network traffic flows as grayscale images allows state-of-the-art image classification models (e.g., ResNet) to transfer to the domain of network traffic classification effectively.Rights
Copyright of the original work is retained by the author.Collections
Related items
Showing items related by title, author, creator and subject.
-
Adversarial machine learning in computer vision: attacks and defenses on machine learning modelsYue, Chuan; Qin, Yi; Camp, Tracy; Han, Qi; Belviranli, Mehmet E.; Mohagheghi, Salman (Colorado School of Mines. Arthur Lakes Library, 2021)Machine learning models, including neural networks, have gained great popularity in recent years. Deep neural networks are able to directly learn from raw data and can outperform traditional machine learning models. As a result, they have been increasingly used in a variety of application domains such as image classification, natural language processing, and malware detection. However, deep neural networks are demonstrated to be vulnerable to adversarial examples at the test time. Adversarial examples are malicious inputs generated from the legitimate inputs by adding small perturbations in order to fool machine learning models to misclassify. We mainly aim to answer two research questions in this thesis: How are machine learning models vulnerable to adversarial examples? How can we better defend against the adversarial examples? We first improve the effectiveness of adversarial training by designing an experimental framework to study Method-Based Ensemble Adversarial Training (MBEAT) and Round Gap Of Adversarial Training (RGOAT). We then demonstrate the strong distinguishability of adversarial examples and design a simple yet effective approach called defensive distinction under the formulation of multi-label classification to protect against adversarial examples. We also propose fuzzing-based hard-label black-box attacks against machine learning models. We design an AdvFuzzer to explore multiple paths between a source image and a guidance image, and design a LocalFuzzer to explore the nearby space around a given input for identifying potential adversarial examples. Lastly, we propose a key-based input transformation defense to defend against adversarial examples.
-
Self-regulated learning: using educational psychology to enhance the learning experience for undergraduate studentsPalmer, Allison; Strong, ScottStudents, like all humans, carry within them more than just their intellectual capacities and technical abilities. In order to reach students' minds, help them learn, and help them develop as people, education systems must strive to view learners in a more holistic manner.
-
Economic order quantity with learningWoolsey, Robert E. D., 1936-; Sweeney, Michael E. (Colorado School of Mines. Arthur Lakes Library, 1976)