Machine learning for network traffic classification under labeled data and training time constraints
dc.contributor.advisor | Camp, Tracy | |
dc.contributor.advisor | Stone, Kerri A. | |
dc.contributor.author | Hussey, Jason P. | |
dc.date.accessioned | 2023-12-01T23:17:25Z | |
dc.date.available | 2023-12-01T23:17:25Z | |
dc.date.issued | 2023 | |
dc.identifier | Hussey_mines_0052E_12676.pdf | |
dc.identifier | T 9602 | |
dc.identifier.uri | https://hdl.handle.net/11124/178608 | |
dc.description | Includes bibliographical references. | |
dc.description | 2023 Summer. | |
dc.description.abstract | This thesis investigates using machine learning (including deep learning) for network traffic classification when constrained by too little labeled data or insufficient time to train models from scratch. Network traffic classification is essential in network security, network management, and application identification. Labeling network traffic data, however, is often time-consuming and expensive, which limits the amount of labeled data available for training machine learning models. This thesis investigates using a semi-supervised learning approach that leverages positively labeled and unlabeled data to improve classification performance when faced with a lack of labeled data. The method uses a combination of bootstrap aggregation and tree-based classifiers to classify unlabeled network traffic flows from the same class successfully. This same semi-supervised learning approach also successfully detects zero-day (i.e., never before seen) encrypted messaging applications for which no training data is available. Additionally, this thesis investigates using deep transfer learning from a state-of-the-art computer vision model for network traffic image classification. By representing network traffic flows as grayscale network traffic images, highly sophisticated image classification models can transfer to the task of network traffic classification. Using these advanced models as a source for training dramatically enhances the speed at which new models can train, addressing the constraint of having too little time for training. To investigate whether deep transfer learning is successful in network traffic image classification, this work used our network flow capture system (which creates a volume of unlabeled data) and commercial appliances (to turn the unlabeled dataset into a real-world labeled dataset). Experimental results in this thesis demonstrate that the semi-supervised learning technique of positive and unlabeled learning is highly effective at detecting hidden positives amongst unlabeled data. Furthermore, this thesis shows that representing network traffic flows as grayscale images allows state-of-the-art image classification models (e.g., ResNet) to transfer to the domain of network traffic classification effectively. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado School of Mines. Arthur Lakes Library | |
dc.relation.ispartof | 2023 - Mines Theses & Dissertations | |
dc.rights | Copyright of the original work is retained by the author. | |
dc.subject | computer network | |
dc.subject | deep learning | |
dc.subject | machine learning | |
dc.subject | network traffic classification | |
dc.subject | semi-supervised learning | |
dc.subject | transfer learning | |
dc.title | Machine learning for network traffic classification under labeled data and training time constraints | |
dc.type | Text | |
dc.date.updated | 2023-11-30T05:07:41Z | |
dc.contributor.committeemember | Bandyopadhyay, Soutir | |
dc.contributor.committeemember | Romig, Phillip Richardson | |
dc.contributor.committeemember | Wang, Hua | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) | |
thesis.degree.level | Doctoral | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | Colorado School of Mines |