Show simple item record

dc.contributor.advisorCamp, Tracy
dc.contributor.advisorStone, Kerri A.
dc.contributor.authorHussey, Jason P.
dc.date.accessioned2023-12-01T23:17:25Z
dc.date.available2023-12-01T23:17:25Z
dc.date.issued2023
dc.identifierHussey_mines_0052E_12676.pdf
dc.identifierT 9602
dc.identifier.urihttps://hdl.handle.net/11124/178608
dc.descriptionIncludes bibliographical references.
dc.description2023 Summer.
dc.description.abstractThis thesis investigates using machine learning (including deep learning) for network traffic classification when constrained by too little labeled data or insufficient time to train models from scratch. Network traffic classification is essential in network security, network management, and application identification. Labeling network traffic data, however, is often time-consuming and expensive, which limits the amount of labeled data available for training machine learning models. This thesis investigates using a semi-supervised learning approach that leverages positively labeled and unlabeled data to improve classification performance when faced with a lack of labeled data. The method uses a combination of bootstrap aggregation and tree-based classifiers to classify unlabeled network traffic flows from the same class successfully. This same semi-supervised learning approach also successfully detects zero-day (i.e., never before seen) encrypted messaging applications for which no training data is available. Additionally, this thesis investigates using deep transfer learning from a state-of-the-art computer vision model for network traffic image classification. By representing network traffic flows as grayscale network traffic images, highly sophisticated image classification models can transfer to the task of network traffic classification. Using these advanced models as a source for training dramatically enhances the speed at which new models can train, addressing the constraint of having too little time for training. To investigate whether deep transfer learning is successful in network traffic image classification, this work used our network flow capture system (which creates a volume of unlabeled data) and commercial appliances (to turn the unlabeled dataset into a real-world labeled dataset). Experimental results in this thesis demonstrate that the semi-supervised learning technique of positive and unlabeled learning is highly effective at detecting hidden positives amongst unlabeled data. Furthermore, this thesis shows that representing network traffic flows as grayscale images allows state-of-the-art image classification models (e.g., ResNet) to transfer to the domain of network traffic classification effectively.
dc.format.mediumborn digital
dc.format.mediumdoctoral dissertations
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado School of Mines. Arthur Lakes Library
dc.relation.ispartof2023 - Mines Theses & Dissertations
dc.rightsCopyright of the original work is retained by the author.
dc.subjectcomputer network
dc.subjectdeep learning
dc.subjectmachine learning
dc.subjectnetwork traffic classification
dc.subjectsemi-supervised learning
dc.subjecttransfer learning
dc.titleMachine learning for network traffic classification under labeled data and training time constraints
dc.typeText
dc.date.updated2023-11-30T05:07:41Z
dc.contributor.committeememberBandyopadhyay, Soutir
dc.contributor.committeememberRomig, Phillip Richardson
dc.contributor.committeememberWang, Hua
thesis.degree.nameDoctor of Philosophy (Ph.D.)
thesis.degree.levelDoctoral
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado School of Mines


Files in this item

Thumbnail
Name:
Hussey_mines_0052E_12676.pdf
Size:
2.210Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record